Thread: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

From
"J. Greg Davidson"
Date:
E.1.2.2. Casting            * Tighten casting checks for domains based on arrays (Tom Lane)               When a domain
isbased on an array type,..., such a domain type       is no longer allowed to match an anyarray parameter of a
polymorphicfunction, except by explicitly downcasting it to the       base array type.        
 
This will require me to add hundreds of casts to my code.  I do not get
how this will "Tighten casting checks".  It will certainly not tighten
my code!  Could you explain how it is good to not be able to do array
operations with a type which is an array?

BTW: All of my DOMAINs which are array types exist because of
PostgreSQL's inability to infer array types for DOMAINs, so I
have lots of code like this:

CREATE DOMAIN foo_ids AS integer;
CREATE DOMAIN foo_id_arrays AS integer[];

I would love to be able to simply use foo_ids[] instead of
having to have the second DOMAIN foo_id_arrays.

If there is some value which I'm missing in the above "Tighten"ing,
perhaps it could be put in *after* PostgreSQL were given the ability
to understand foo_ids[] as an array of foo_ids.

Thanks,

_Greg



Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 6:25 PM, J. Greg Davidson <greg@ngender.net> wrote:
>
> E.1.2.2. Casting
>
>      * Tighten casting checks for domains based on arrays (Tom Lane)
>
>        When a domain is based on an array type,..., such a domain type
>        is no longer allowed to match an anyarray parameter of a
>        polymorphic function, except by explicitly downcasting it to the
>        base array type.
>
> This will require me to add hundreds of casts to my code.  I do not get
> how this will "Tighten casting checks".  It will certainly not tighten
> my code!  Could you explain how it is good to not be able to do array
> operations with a type which is an array?
>
> BTW: All of my DOMAINs which are array types exist because of
> PostgreSQL's inability to infer array types for DOMAINs, so I
> have lots of code like this:
>
> CREATE DOMAIN foo_ids AS integer;
> CREATE DOMAIN foo_id_arrays AS integer[];
>
> I would love to be able to simply use foo_ids[] instead of
> having to have the second DOMAIN foo_id_arrays.
>
> If there is some value which I'm missing in the above "Tighten"ing,
> perhaps it could be put in *after* PostgreSQL were given the ability
> to understand foo_ids[] as an array of foo_ids.

I didn't read the thread that led up to this change (see:
http://postgresql.1045698.n5.nabble.com/Domains-versus-arrays-versus-typmods-td3227701.html)
but if I had, I would have argued that the problem cases listed, the
worst being the failed constraint check, do not justify the
compatibility break :(.  In the pre-unnest() world, you might have
gotten away with it, but it's been out for two released versions (and
some ad hoc approaches for longer than that) and perhaps we were too
quick on the trigger.  Considering we've already got a report during
beta1, this does not exactly inspire confidence.

We've got other cases of known bugs where a good solution is unclear
or breaks unknown amounts of user code.  The giant clusterfark
surrounding is null and composites comes to mind.

merlin


"J. Greg Davidson" <greg@ngender.net> writes:
>       * Tighten casting checks for domains based on arrays (Tom Lane)        
>         When a domain is based on an array type,..., such a domain type
>         is no longer allowed to match an anyarray parameter of a
>         polymorphic function, except by explicitly downcasting it to the
>         base array type.        
> This will require me to add hundreds of casts to my code.  I do not get
> how this will "Tighten casting checks".  It will certainly not tighten
> my code!  Could you explain how it is good to not be able to do array
> operations with a type which is an array?

The discussion that led up to that decision is in this thread:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php
specifically here:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php

The previous behavior was clearly broken.  The new behavior is at least
consistent.  It might be more user-friendly if we did automatic
downcasts in these cases, but we were not (and still are not) doing
automatic downcasts for domains over scalar types in comparable cases,
so it's not very clear why domains over array types should be treated
differently.

To be concrete, consider the function array_append(anyarray, anyelement)
yielding anyarray.  Suppose we have a domain D over int[] and the call
array_append(var_of_type_D, 42).  If we automatically downcast the
variable to int[], should the result of the function be considered to be
of type D, or type int[]?  This isn't a trivial distinction because
choosing to consider it of type D means we have to re-check D's domain
constraints, which might or might not be satisfied by the modified
array.  Previous releases considered the result to be of type D,
*without* rechecking the domain constraints, which was flat out wrong.

So we basically had three alternatives to make it better:* downcast to the array type, which would possibly silently
breakapplications that were relying on the function result  being considered of the domain type* re-apply domain checks
onthe function result, which would be  a performance hit and possibly again result in unobvious  breakage* explicitly
breakit by throwing a parse error until you  downcast (and then upcast the function result if you want)
 
I realize that #3 is a bit unpleasant, but are either of the other two
better?  At least #3 shows you where you need to check for problems.

There is another issue that wasn't really mentioned in the previous
thread, which is that if we are matching a domain-over-array to a
function's ANYARRAY argument, what exactly should be allowed to match to
ANYELEMENT --- or if the function returns ANYELEMENT, what should the
imputed result type be?  AFAICS it's impossible to give an answer to
that without effectively deciding that function argument matching
smashes the domain to its base type (the array type).  It's not very
clear what's the point of a domain type if every operation on it is
going to neglect its domain-ness.
        regards, tom lane


Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

From
"J. Greg Davidson"
Date:
Given:
 CREATE DOMAIN int_array AS int[];

The operator [] works fine in 4.1beta1:
 SELECT (ARRAY[1,2,3]::int_array)[1];

proving that int_array is an array type with element type int.

It is inconsistent that other array functions and operators don't work.

On Mon, 2011-05-09 at 23:32 -0400, Tom Lane wrote:

> So we basically had three alternatives to make it better:
>     #1 downcast to the array type, which would possibly silently
>       break applications that were relying on the function result
>       being considered of the domain type

I do not think of this as "Downcasting int_array to int[]" but as
allowing an ANYARRAY to match int_array which is an array type.
Since no cast is logically required, the return type is the same
as the first argument type, as expected and as PostgreSQL has done
for some time.

>     #2 re-apply domain checks on the function result, which would be
>       a performance hit and possibly again result in unobvious
>       breakage

If the function result is a new value then nothing is being re-applied.
If it is an existing value of the domain type which was passed in or
extracted from a data structure, then the domain checks have already
been applied.  This is a red herring.

>     #3 explicitly break it by throwing a parse error until you
>       downcast (and then upcast the function result if you want)

> I realize that #3 is a bit unpleasant, but are either of the other two
> better?  At least #3 shows you where you need to check for problems.

Wrapping most (but not all) of your array operations in downcasts and
upcasts is horrible.

> There is another issue that wasn't really mentioned in the previous
> thread, which is that if we are matching a domain-over-array to a
> function's ANYARRAY argument, what exactly should be allowed to match to
> ANYELEMENT --- or if the function returns ANYELEMENT, what should the
> imputed result type be?

Since PostgreSQL allows indexing of the domain type, we already know
the answer.  I don't even get why there is confusion abou the
element type of an array.

> AFAICS it's impossible to give an answer to
> that without effectively deciding that function argument matching
> smashes the domain to its base type (the array type).  It's not very
> clear what's the point of a domain type if every operation on it is
> going to neglect its domain-ness.

Yes, what is the point of neglecting the domain-ness of a domain type
by being forced to downcast it to an unchecked type before (some) array
operations?  If a value is being constructed of a domain-type which has
constraints, check them.  When I don't want the security of a domain
type I can cast it to its representation type before I passed it, but
it seems bizarre to be required to do such a thing!

I did read the previous threads some time ago. They seemed mostly to be
concerned with discussing the internal implementation of these matters
and the typmod feature (which I still don't understand).  The internal
algorithms and deta structures which PostgreSQL uses to internally
represent SQL types and operations are a weak justification for
PostgreSQL's behavior - they can be changed if they are wrong.

I am still hoping to get rid of my domains which are arrays when
PostgreSQL supports arrays of elements which are of domain types.
Could we at least defer this change until that is done?

_Greg


On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "J. Greg Davidson" <greg@ngender.net> writes:
>>       * Tighten casting checks for domains based on arrays (Tom Lane)
>
>>         When a domain is based on an array type,..., such a domain type
>>         is no longer allowed to match an anyarray parameter of a
>>         polymorphic function, except by explicitly downcasting it to the
>>         base array type.
>
>> This will require me to add hundreds of casts to my code.  I do not get
>> how this will "Tighten casting checks".  It will certainly not tighten
>> my code!  Could you explain how it is good to not be able to do array
>> operations with a type which is an array?
>
> The discussion that led up to that decision is in this thread:
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php
> specifically here:
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php
>
> The previous behavior was clearly broken.  The new behavior is at least
> consistent.  It might be more user-friendly if we did automatic
> downcasts in these cases, but we were not (and still are not) doing
> automatic downcasts for domains over scalar types in comparable cases,
> so it's not very clear why domains over array types should be treated
> differently.
>
> To be concrete, consider the function array_append(anyarray, anyelement)
> yielding anyarray.  Suppose we have a domain D over int[] and the call
> array_append(var_of_type_D, 42).  If we automatically downcast the
> variable to int[], should the result of the function be considered to be
> of type D, or type int[]?  This isn't a trivial distinction because
> choosing to consider it of type D means we have to re-check D's domain
> constraints, which might or might not be satisfied by the modified
> array.  Previous releases considered the result to be of type D,
> *without* rechecking the domain constraints, which was flat out wrong.
>
> So we basically had three alternatives to make it better:
>        * downcast to the array type, which would possibly silently
>          break applications that were relying on the function result
>          being considered of the domain type
>        * re-apply domain checks on the function result, which would be
>          a performance hit and possibly again result in unobvious
>          breakage
>        * explicitly break it by throwing a parse error until you
>          downcast (and then upcast the function result if you want)
> I realize that #3 is a bit unpleasant, but are either of the other two
> better?  At least #3 shows you where you need to check for problems.

Aren't any applications that would be broken by #1 broken already?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So we basically had three alternatives to make it better:
>>        * downcast to the array type, which would possibly silently
>>          break applications that were relying on the function result
>>          being considered of the domain type
>>        * re-apply domain checks on the function result, which would be
>>          a performance hit and possibly again result in unobvious
>>          breakage
>>        * explicitly break it by throwing a parse error until you
>>          downcast (and then upcast the function result if you want)
>> I realize that #3 is a bit unpleasant, but are either of the other two
>> better?  At least #3 shows you where you need to check for problems.

> Aren't any applications that would be broken by #1 broken already?

My point is that doing #1 would break them *silently* --- if you did
have a problem, figuring out what it was could require a great deal
of sleuthing.
        regards, tom lane


On Tue, May 10, 2011 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So we basically had three alternatives to make it better:
>>>        * downcast to the array type, which would possibly silently
>>>          break applications that were relying on the function result
>>>          being considered of the domain type
>>>        * re-apply domain checks on the function result, which would be
>>>          a performance hit and possibly again result in unobvious
>>>          breakage
>>>        * explicitly break it by throwing a parse error until you
>>>          downcast (and then upcast the function result if you want)
>>> I realize that #3 is a bit unpleasant, but are either of the other two
>>> better?  At least #3 shows you where you need to check for problems.
>
>> Aren't any applications that would be broken by #1 broken already?
>
> My point is that doing #1 would break them *silently* --- if you did
> have a problem, figuring out what it was could require a great deal
> of sleuthing.

Eh, I'm confused.  Explain further?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 10, 2011 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> So we basically had three alternatives to make it better:
>>>>        * downcast to the array type, which would possibly silently
>>>>          break applications that were relying on the function result
>>>>          being considered of the domain type

>>> Aren't any applications that would be broken by #1 broken already?

>> My point is that doing #1 would break them *silently* --- if you did
>> have a problem, figuring out what it was could require a great deal
>> of sleuthing.

> Eh, I'm confused.  Explain further?

The previous behavior was effectively to allow a domain-over-array type
to match the ANYARRAY symbol, without doing anything else special with
it.  In particular if the function returned ANYARRAY then its output
would be taken to be of the domain type, which is wrong since the
function might produce an array value that doesn't meet the domain's
constraints.

We could, and perhaps should, instead downcast the domain to the array
type, which would imply that ANYARRAY is matching the base type not the
domain, and in particular that a declared ANYARRAY result type means the
base type not the domain.  The things that were bothering me about this
at the time were (1) it would be a silent change of behavior, and (2)
it doesn't seem very consistent to handle domain-to-ANYARRAY matching
this way without also doing something with domain-to-ANYELEMENT
matching.

An example of the inconsistency is that something like
    create domain myi as int;    select array[1,2,3] || 4::myi;

fails with "operator does not exist: integer[] || myi", not only in HEAD
but all recent releases.  If we're going to downcast a domain-over-array
to plain array to allow it to be used with array_append, it's not clear
why we don't allow myi to be automatically downcast to int for the same
purpose.

However, exactly what we ought to do instead isn't entirely clear, and
when I brought it up back in October no one seemed to care enough to
pursue the matter.  So I just left both cases as throwing error, which
seemed the most conservative course.

I'm still willing to talk about alternatives, though it seems a bit
late in the 9.1 cycle to be designing behaviors.
        regards, tom lane


On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
> To be concrete, consider the function array_append(anyarray, anyelement)
> yielding anyarray.  Suppose we have a domain D over int[] and the call
> array_append(var_of_type_D, 42).  If we automatically downcast the
> variable to int[], should the result of the function be considered to be
> of type D, or type int[]?  This isn't a trivial distinction because
> choosing to consider it of type D means we have to re-check D's domain
> constraints, which might or might not be satisfied by the modified
> array.  Previous releases considered the result to be of type D,
> *without* rechecking the domain constraints, which was flat out wrong.
> 
> So we basically had three alternatives to make it better:
>     * downcast to the array type, which would possibly silently
>       break applications that were relying on the function result
>       being considered of the domain type
>     * re-apply domain checks on the function result, which would be
>       a performance hit and possibly again result in unobvious
>       breakage
>     * explicitly break it by throwing a parse error until you
>       downcast (and then upcast the function result if you want)
> I realize that #3 is a bit unpleasant, but are either of the other two
> better?  At least #3 shows you where you need to check for problems.

Though I've never used a domain over an array type, I'd strongly prefer #2.  As
best I can tell, the only functions that could need to do this are those that
have an anyarray argument and also an anyarray return type.  In core, that's
array_append, array_prepend, array_cat, array_larger, array_smaller, max, and
min.  The last four always return an input unchanged, so only the first three
would actually check anything.  I'm not seeing, offhand, any need to add new
validation to PLs.

The parse-time breakage of #3 is a nice tool during your upgrade QA, but the
long-term semantics are considerably worse for wear.

I see nothing to recommend #1.

> There is another issue that wasn't really mentioned in the previous
> thread, which is that if we are matching a domain-over-array to a
> function's ANYARRAY argument, what exactly should be allowed to match to
> ANYELEMENT --- or if the function returns ANYELEMENT, what should the
> imputed result type be?

What else but the type seen by subscripting a datum of the ANYARRAY type?

> AFAICS it's impossible to give an answer to
> that without effectively deciding that function argument matching
> smashes the domain to its base type (the array type).

The domain and its base type share an element type, but how does that
necessitate any further removal of domain-ness?

> It's not very
> clear what's the point of a domain type if every operation on it is
> going to neglect its domain-ness.

Agreed.

nm


Noah Misch <noah@leadboat.com> writes:
> On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
>> So we basically had three alternatives to make it better:
>>     * downcast to the array type, which would possibly silently
>>       break applications that were relying on the function result
>>       being considered of the domain type
>>     * re-apply domain checks on the function result, which would be
>>       a performance hit and possibly again result in unobvious
>>       breakage
>>     * explicitly break it by throwing a parse error until you
>>       downcast (and then upcast the function result if you want)
>> I realize that #3 is a bit unpleasant, but are either of the other two
>> better?  At least #3 shows you where you need to check for problems.

> Though I've never used a domain over an array type, I'd strongly prefer #2.

Hmm.  I hadn't seriously considered that alternative, but we could go in
that direction.  Logically, what this would probably imply is inserting
CastToDomain whenever the result of a polymorphic function is deemed to
be of a domain type, whether the base type is array or not.

The reason I hadn't taken it very seriously is that I don't think it's
actually going to end up being consistent.  If we don't do #1 (downcast
polymorphic arguments to a base type), but consider the arguments passed
to the function to be of the domain type, then really we have to expect
the polymorphic function to enforce domain constraints internally; we
cannot fix it with something as localized as having the function call
parser stick a CastToDomain on top.  Here's a possibly rather silly
example:
create function negate(anyelement) returns anyelement as$$ select - $1 $$ language sql;
create domain pos as int check (value > 0);
select negate(42::pos);

This negate() function will work for any type that has a unary minus
operator.  But the result of the unary minus operation cannot sanely be
considered to be of this domain type.  In this simplified example you
might feel it doesn't matter, since with an external CastToDomain we'd
throw error anyway a moment later, as soon as control comes back from
the function.  But what if the function does further operations with the
value, such as passing it to another polymorphic function?

So really, if you go down this path, you end up concluding that PLs
supporting polymorphic arguments had better be prepared to enforce
domain constraints all the way through, and thus there should be no need
for an external CastToDomain --- what comes back from the function ought
to be checked already.  Unfortunately, even if the PLs do that (SQL
functions might get it right, but I'm not real sure whether plpgsql is
water-tight on this, and I don't trust the other PLs for it at all),
there's no way that built-in polymorphic functions like array_append are
going to.

So on the whole, #2 looks like an implementation quagmire to me: it's
not clear what to check, or where, or how you know when you're done.
I'm not willing to volunteer my own time to make it work that way.
If somebody else who uses domains a lot wants to step up and take
responsibility, go for it.
        regards, tom lane


On Wed, May 11, 2011 at 10:22:01AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
> >> So we basically had three alternatives to make it better:
> >>     * downcast to the array type, which would possibly silently
> >>       break applications that were relying on the function result
> >>       being considered of the domain type
> >>     * re-apply domain checks on the function result, which would be
> >>       a performance hit and possibly again result in unobvious
> >>       breakage
> >>     * explicitly break it by throwing a parse error until you
> >>       downcast (and then upcast the function result if you want)
> >> I realize that #3 is a bit unpleasant, but are either of the other two
> >> better?  At least #3 shows you where you need to check for problems.
> 
> > Though I've never used a domain over an array type, I'd strongly prefer #2.
> 
> Hmm.  I hadn't seriously considered that alternative, but we could go in
> that direction.  Logically, what this would probably imply is inserting
> CastToDomain whenever the result of a polymorphic function is deemed to
> be of a domain type, whether the base type is array or not.
> 
> The reason I hadn't taken it very seriously is that I don't think it's
> actually going to end up being consistent.  If we don't do #1 (downcast
> polymorphic arguments to a base type), but consider the arguments passed
> to the function to be of the domain type, then really we have to expect
> the polymorphic function to enforce domain constraints internally; we
> cannot fix it with something as localized as having the function call
> parser stick a CastToDomain on top.  Here's a possibly rather silly
> example:
> 
>     create function negate(anyelement) returns anyelement as
>     $$ select - $1 $$ language sql;
> 
>     create domain pos as int check (value > 0);
> 
>     select negate(42::pos);
> 
> This negate() function will work for any type that has a unary minus
> operator.  But the result of the unary minus operation cannot sanely be
> considered to be of this domain type.

While simple, I think that example covers the salient features.  The git master
behavior is sound: "ERROR:  return type mismatch in function declared to return
pos DETAIL:  Actual return type is integer."  If you defined a unary minus
operator for the domain type itself, the function implementing that operator
would then gain responsibility for preserving any domain constraints.

Now, perhaps it's unfortunate that the example can't be easily rewritten to
actually work for arbitrary domain inputs (without rewriting it in C).  I don't
have any particular ideas for improving that.

> In this simplified example you
> might feel it doesn't matter, since with an external CastToDomain we'd
> throw error anyway a moment later, as soon as control comes back from
> the function.  But what if the function does further operations with the
> value, such as passing it to another polymorphic function?

The SQL PL understands that the value of -($1::pos) has type integer, as does
PL/pgSQL.  I'm not seeing any problems offhand.

> So really, if you go down this path, you end up concluding that PLs
> supporting polymorphic arguments had better be prepared to enforce
> domain constraints all the way through, and thus there should be no need
> for an external CastToDomain --- what comes back from the function ought
> to be checked already.

That was my conclusion.  I'm not aware of any particular holes in this area, but
I won't wager there are none.  The code that produces the domain-typed datums is
not usually part of the PL implementation, so the PL mostly needs to rigorously
track the provenance of its datums.

> Unfortunately, even if the PLs do that (SQL
> functions might get it right, but I'm not real sure whether plpgsql is
> water-tight on this, and I don't trust the other PLs for it at all),
> there's no way that built-in polymorphic functions like array_append are
> going to.
> 
> So on the whole, #2 looks like an implementation quagmire to me: it's
> not clear what to check, or where, or how you know when you're done.

Every C function is responsible for returning a datum consistent with its
declared return type: we do nothing in particular to ensure that a function has
done so.  I see domain constraints as a natural extension of that rule.  If a C
function purports to return a domain-typed datum, perhaps through polymorphism,
it's responsible for having checked the datum against that domain.  Every PL
accepts responsibility for enforcing this on behalf of its functions, much as it
enforces general consistency between return types and actual returned datums.

A function with a polymorphic argument returning the same polymorphic type is
consequently responsible for applying domain checks when the concrete type is a
domain.  (A function with an anyelement argument that returns anyarray is not a
problem, because the return type is never a domain.  Likewise for a function
with an anyarray argument that returns anyelement.)  Functions returning
anyarray are at the most risk, because it's actually practical to write a
function that fabricates an array of arbitrary type.  I can't come up with an
analogous example for a scalar type -- they end up delegating the fabrication to
type-specific code, like your unary negation example.

Again, I think that leaves just array_append, array_prepend, and array_cat
actually needing a new check.

> I'm not willing to volunteer my own time to make it work that way.
> If somebody else who uses domains a lot wants to step up and take
> responsibility, go for it.

I'm willing the implement the above behavior, if it seems reasonable.

nm