Thread: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
From
"J. Greg Davidson"
Date:
E.1.2.2. Casting * Tighten casting checks for domains based on arrays (Tom Lane) When a domain isbased on an array type,..., such a domain type is no longer allowed to match an anyarray parameter of a polymorphicfunction, except by explicitly downcasting it to the base array type. This will require me to add hundreds of casts to my code. I do not get how this will "Tighten casting checks". It will certainly not tighten my code! Could you explain how it is good to not be able to do array operations with a type which is an array? BTW: All of my DOMAINs which are array types exist because of PostgreSQL's inability to infer array types for DOMAINs, so I have lots of code like this: CREATE DOMAIN foo_ids AS integer; CREATE DOMAIN foo_id_arrays AS integer[]; I would love to be able to simply use foo_ids[] instead of having to have the second DOMAIN foo_id_arrays. If there is some value which I'm missing in the above "Tighten"ing, perhaps it could be put in *after* PostgreSQL were given the ability to understand foo_ids[] as an array of foo_ids. Thanks, _Greg
Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
From
Merlin Moncure
Date:
On Mon, May 9, 2011 at 6:25 PM, J. Greg Davidson <greg@ngender.net> wrote: > > E.1.2.2. Casting > > * Tighten casting checks for domains based on arrays (Tom Lane) > > When a domain is based on an array type,..., such a domain type > is no longer allowed to match an anyarray parameter of a > polymorphic function, except by explicitly downcasting it to the > base array type. > > This will require me to add hundreds of casts to my code. I do not get > how this will "Tighten casting checks". It will certainly not tighten > my code! Could you explain how it is good to not be able to do array > operations with a type which is an array? > > BTW: All of my DOMAINs which are array types exist because of > PostgreSQL's inability to infer array types for DOMAINs, so I > have lots of code like this: > > CREATE DOMAIN foo_ids AS integer; > CREATE DOMAIN foo_id_arrays AS integer[]; > > I would love to be able to simply use foo_ids[] instead of > having to have the second DOMAIN foo_id_arrays. > > If there is some value which I'm missing in the above "Tighten"ing, > perhaps it could be put in *after* PostgreSQL were given the ability > to understand foo_ids[] as an array of foo_ids. I didn't read the thread that led up to this change (see: http://postgresql.1045698.n5.nabble.com/Domains-versus-arrays-versus-typmods-td3227701.html) but if I had, I would have argued that the problem cases listed, the worst being the failed constraint check, do not justify the compatibility break :(. In the pre-unnest() world, you might have gotten away with it, but it's been out for two released versions (and some ad hoc approaches for longer than that) and perhaps we were too quick on the trigger. Considering we've already got a report during beta1, this does not exactly inspire confidence. We've got other cases of known bugs where a good solution is unclear or breaks unknown amounts of user code. The giant clusterfark surrounding is null and composites comes to mind. merlin
"J. Greg Davidson" <greg@ngender.net> writes: > * Tighten casting checks for domains based on arrays (Tom Lane) > When a domain is based on an array type,..., such a domain type > is no longer allowed to match an anyarray parameter of a > polymorphic function, except by explicitly downcasting it to the > base array type. > This will require me to add hundreds of casts to my code. I do not get > how this will "Tighten casting checks". It will certainly not tighten > my code! Could you explain how it is good to not be able to do array > operations with a type which is an array? The discussion that led up to that decision is in this thread: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php specifically here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php The previous behavior was clearly broken. The new behavior is at least consistent. It might be more user-friendly if we did automatic downcasts in these cases, but we were not (and still are not) doing automatic downcasts for domains over scalar types in comparable cases, so it's not very clear why domains over array types should be treated differently. To be concrete, consider the function array_append(anyarray, anyelement) yielding anyarray. Suppose we have a domain D over int[] and the call array_append(var_of_type_D, 42). If we automatically downcast the variable to int[], should the result of the function be considered to be of type D, or type int[]? This isn't a trivial distinction because choosing to consider it of type D means we have to re-check D's domain constraints, which might or might not be satisfied by the modified array. Previous releases considered the result to be of type D, *without* rechecking the domain constraints, which was flat out wrong. So we basically had three alternatives to make it better:* downcast to the array type, which would possibly silently breakapplications that were relying on the function result being considered of the domain type* re-apply domain checks onthe function result, which would be a performance hit and possibly again result in unobvious breakage* explicitly breakit by throwing a parse error until you downcast (and then upcast the function result if you want) I realize that #3 is a bit unpleasant, but are either of the other two better? At least #3 shows you where you need to check for problems. There is another issue that wasn't really mentioned in the previous thread, which is that if we are matching a domain-over-array to a function's ANYARRAY argument, what exactly should be allowed to match to ANYELEMENT --- or if the function returns ANYELEMENT, what should the imputed result type be? AFAICS it's impossible to give an answer to that without effectively deciding that function argument matching smashes the domain to its base type (the array type). It's not very clear what's the point of a domain type if every operation on it is going to neglect its domain-ness. regards, tom lane
Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
From
"J. Greg Davidson"
Date:
Given: CREATE DOMAIN int_array AS int[]; The operator [] works fine in 4.1beta1: SELECT (ARRAY[1,2,3]::int_array)[1]; proving that int_array is an array type with element type int. It is inconsistent that other array functions and operators don't work. On Mon, 2011-05-09 at 23:32 -0400, Tom Lane wrote: > So we basically had three alternatives to make it better: > #1 downcast to the array type, which would possibly silently > break applications that were relying on the function result > being considered of the domain type I do not think of this as "Downcasting int_array to int[]" but as allowing an ANYARRAY to match int_array which is an array type. Since no cast is logically required, the return type is the same as the first argument type, as expected and as PostgreSQL has done for some time. > #2 re-apply domain checks on the function result, which would be > a performance hit and possibly again result in unobvious > breakage If the function result is a new value then nothing is being re-applied. If it is an existing value of the domain type which was passed in or extracted from a data structure, then the domain checks have already been applied. This is a red herring. > #3 explicitly break it by throwing a parse error until you > downcast (and then upcast the function result if you want) > I realize that #3 is a bit unpleasant, but are either of the other two > better? At least #3 shows you where you need to check for problems. Wrapping most (but not all) of your array operations in downcasts and upcasts is horrible. > There is another issue that wasn't really mentioned in the previous > thread, which is that if we are matching a domain-over-array to a > function's ANYARRAY argument, what exactly should be allowed to match to > ANYELEMENT --- or if the function returns ANYELEMENT, what should the > imputed result type be? Since PostgreSQL allows indexing of the domain type, we already know the answer. I don't even get why there is confusion abou the element type of an array. > AFAICS it's impossible to give an answer to > that without effectively deciding that function argument matching > smashes the domain to its base type (the array type). It's not very > clear what's the point of a domain type if every operation on it is > going to neglect its domain-ness. Yes, what is the point of neglecting the domain-ness of a domain type by being forced to downcast it to an unchecked type before (some) array operations? If a value is being constructed of a domain-type which has constraints, check them. When I don't want the security of a domain type I can cast it to its representation type before I passed it, but it seems bizarre to be required to do such a thing! I did read the previous threads some time ago. They seemed mostly to be concerned with discussing the internal implementation of these matters and the typmod feature (which I still don't understand). The internal algorithms and deta structures which PostgreSQL uses to internally represent SQL types and operations are a weak justification for PostgreSQL's behavior - they can be changed if they are wrong. I am still hoping to get rid of my domains which are arrays when PostgreSQL supports arrays of elements which are of domain types. Could we at least defer this change until that is done? _Greg
On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "J. Greg Davidson" <greg@ngender.net> writes: >> * Tighten casting checks for domains based on arrays (Tom Lane) > >> When a domain is based on an array type,..., such a domain type >> is no longer allowed to match an anyarray parameter of a >> polymorphic function, except by explicitly downcasting it to the >> base array type. > >> This will require me to add hundreds of casts to my code. I do not get >> how this will "Tighten casting checks". It will certainly not tighten >> my code! Could you explain how it is good to not be able to do array >> operations with a type which is an array? > > The discussion that led up to that decision is in this thread: > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php > specifically here: > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php > > The previous behavior was clearly broken. The new behavior is at least > consistent. It might be more user-friendly if we did automatic > downcasts in these cases, but we were not (and still are not) doing > automatic downcasts for domains over scalar types in comparable cases, > so it's not very clear why domains over array types should be treated > differently. > > To be concrete, consider the function array_append(anyarray, anyelement) > yielding anyarray. Suppose we have a domain D over int[] and the call > array_append(var_of_type_D, 42). If we automatically downcast the > variable to int[], should the result of the function be considered to be > of type D, or type int[]? This isn't a trivial distinction because > choosing to consider it of type D means we have to re-check D's domain > constraints, which might or might not be satisfied by the modified > array. Previous releases considered the result to be of type D, > *without* rechecking the domain constraints, which was flat out wrong. > > So we basically had three alternatives to make it better: > * downcast to the array type, which would possibly silently > break applications that were relying on the function result > being considered of the domain type > * re-apply domain checks on the function result, which would be > a performance hit and possibly again result in unobvious > breakage > * explicitly break it by throwing a parse error until you > downcast (and then upcast the function result if you want) > I realize that #3 is a bit unpleasant, but are either of the other two > better? At least #3 shows you where you need to check for problems. Aren't any applications that would be broken by #1 broken already? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So we basically had three alternatives to make it better: >> * downcast to the array type, which would possibly silently >> break applications that were relying on the function result >> being considered of the domain type >> * re-apply domain checks on the function result, which would be >> a performance hit and possibly again result in unobvious >> breakage >> * explicitly break it by throwing a parse error until you >> downcast (and then upcast the function result if you want) >> I realize that #3 is a bit unpleasant, but are either of the other two >> better? At least #3 shows you where you need to check for problems. > Aren't any applications that would be broken by #1 broken already? My point is that doing #1 would break them *silently* --- if you did have a problem, figuring out what it was could require a great deal of sleuthing. regards, tom lane
On Tue, May 10, 2011 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> So we basically had three alternatives to make it better: >>> * downcast to the array type, which would possibly silently >>> break applications that were relying on the function result >>> being considered of the domain type >>> * re-apply domain checks on the function result, which would be >>> a performance hit and possibly again result in unobvious >>> breakage >>> * explicitly break it by throwing a parse error until you >>> downcast (and then upcast the function result if you want) >>> I realize that #3 is a bit unpleasant, but are either of the other two >>> better? At least #3 shows you where you need to check for problems. > >> Aren't any applications that would be broken by #1 broken already? > > My point is that doing #1 would break them *silently* --- if you did > have a problem, figuring out what it was could require a great deal > of sleuthing. Eh, I'm confused. Explain further? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 10, 2011 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Mon, May 9, 2011 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> So we basically had three alternatives to make it better: >>>> * downcast to the array type, which would possibly silently >>>> break applications that were relying on the function result >>>> being considered of the domain type >>> Aren't any applications that would be broken by #1 broken already? >> My point is that doing #1 would break them *silently* --- if you did >> have a problem, figuring out what it was could require a great deal >> of sleuthing. > Eh, I'm confused. Explain further? The previous behavior was effectively to allow a domain-over-array type to match the ANYARRAY symbol, without doing anything else special with it. In particular if the function returned ANYARRAY then its output would be taken to be of the domain type, which is wrong since the function might produce an array value that doesn't meet the domain's constraints. We could, and perhaps should, instead downcast the domain to the array type, which would imply that ANYARRAY is matching the base type not the domain, and in particular that a declared ANYARRAY result type means the base type not the domain. The things that were bothering me about this at the time were (1) it would be a silent change of behavior, and (2) it doesn't seem very consistent to handle domain-to-ANYARRAY matching this way without also doing something with domain-to-ANYELEMENT matching. An example of the inconsistency is that something like create domain myi as int; select array[1,2,3] || 4::myi; fails with "operator does not exist: integer[] || myi", not only in HEAD but all recent releases. If we're going to downcast a domain-over-array to plain array to allow it to be used with array_append, it's not clear why we don't allow myi to be automatically downcast to int for the same purpose. However, exactly what we ought to do instead isn't entirely clear, and when I brought it up back in October no one seemed to care enough to pursue the matter. So I just left both cases as throwing error, which seemed the most conservative course. I'm still willing to talk about alternatives, though it seems a bit late in the 9.1 cycle to be designing behaviors. regards, tom lane
On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote: > To be concrete, consider the function array_append(anyarray, anyelement) > yielding anyarray. Suppose we have a domain D over int[] and the call > array_append(var_of_type_D, 42). If we automatically downcast the > variable to int[], should the result of the function be considered to be > of type D, or type int[]? This isn't a trivial distinction because > choosing to consider it of type D means we have to re-check D's domain > constraints, which might or might not be satisfied by the modified > array. Previous releases considered the result to be of type D, > *without* rechecking the domain constraints, which was flat out wrong. > > So we basically had three alternatives to make it better: > * downcast to the array type, which would possibly silently > break applications that were relying on the function result > being considered of the domain type > * re-apply domain checks on the function result, which would be > a performance hit and possibly again result in unobvious > breakage > * explicitly break it by throwing a parse error until you > downcast (and then upcast the function result if you want) > I realize that #3 is a bit unpleasant, but are either of the other two > better? At least #3 shows you where you need to check for problems. Though I've never used a domain over an array type, I'd strongly prefer #2. As best I can tell, the only functions that could need to do this are those that have an anyarray argument and also an anyarray return type. In core, that's array_append, array_prepend, array_cat, array_larger, array_smaller, max, and min. The last four always return an input unchanged, so only the first three would actually check anything. I'm not seeing, offhand, any need to add new validation to PLs. The parse-time breakage of #3 is a nice tool during your upgrade QA, but the long-term semantics are considerably worse for wear. I see nothing to recommend #1. > There is another issue that wasn't really mentioned in the previous > thread, which is that if we are matching a domain-over-array to a > function's ANYARRAY argument, what exactly should be allowed to match to > ANYELEMENT --- or if the function returns ANYELEMENT, what should the > imputed result type be? What else but the type seen by subscripting a datum of the ANYARRAY type? > AFAICS it's impossible to give an answer to > that without effectively deciding that function argument matching > smashes the domain to its base type (the array type). The domain and its base type share an element type, but how does that necessitate any further removal of domain-ness? > It's not very > clear what's the point of a domain type if every operation on it is > going to neglect its domain-ness. Agreed. nm
Re: Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote: >> So we basically had three alternatives to make it better: >> * downcast to the array type, which would possibly silently >> break applications that were relying on the function result >> being considered of the domain type >> * re-apply domain checks on the function result, which would be >> a performance hit and possibly again result in unobvious >> breakage >> * explicitly break it by throwing a parse error until you >> downcast (and then upcast the function result if you want) >> I realize that #3 is a bit unpleasant, but are either of the other two >> better? At least #3 shows you where you need to check for problems. > Though I've never used a domain over an array type, I'd strongly prefer #2. Hmm. I hadn't seriously considered that alternative, but we could go in that direction. Logically, what this would probably imply is inserting CastToDomain whenever the result of a polymorphic function is deemed to be of a domain type, whether the base type is array or not. The reason I hadn't taken it very seriously is that I don't think it's actually going to end up being consistent. If we don't do #1 (downcast polymorphic arguments to a base type), but consider the arguments passed to the function to be of the domain type, then really we have to expect the polymorphic function to enforce domain constraints internally; we cannot fix it with something as localized as having the function call parser stick a CastToDomain on top. Here's a possibly rather silly example: create function negate(anyelement) returns anyelement as$$ select - $1 $$ language sql; create domain pos as int check (value > 0); select negate(42::pos); This negate() function will work for any type that has a unary minus operator. But the result of the unary minus operation cannot sanely be considered to be of this domain type. In this simplified example you might feel it doesn't matter, since with an external CastToDomain we'd throw error anyway a moment later, as soon as control comes back from the function. But what if the function does further operations with the value, such as passing it to another polymorphic function? So really, if you go down this path, you end up concluding that PLs supporting polymorphic arguments had better be prepared to enforce domain constraints all the way through, and thus there should be no need for an external CastToDomain --- what comes back from the function ought to be checked already. Unfortunately, even if the PLs do that (SQL functions might get it right, but I'm not real sure whether plpgsql is water-tight on this, and I don't trust the other PLs for it at all), there's no way that built-in polymorphic functions like array_append are going to. So on the whole, #2 looks like an implementation quagmire to me: it's not clear what to check, or where, or how you know when you're done. I'm not willing to volunteer my own time to make it work that way. If somebody else who uses domains a lot wants to step up and take responsibility, go for it. regards, tom lane
Re: Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
From
Noah Misch
Date:
On Wed, May 11, 2011 at 10:22:01AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote: > >> So we basically had three alternatives to make it better: > >> * downcast to the array type, which would possibly silently > >> break applications that were relying on the function result > >> being considered of the domain type > >> * re-apply domain checks on the function result, which would be > >> a performance hit and possibly again result in unobvious > >> breakage > >> * explicitly break it by throwing a parse error until you > >> downcast (and then upcast the function result if you want) > >> I realize that #3 is a bit unpleasant, but are either of the other two > >> better? At least #3 shows you where you need to check for problems. > > > Though I've never used a domain over an array type, I'd strongly prefer #2. > > Hmm. I hadn't seriously considered that alternative, but we could go in > that direction. Logically, what this would probably imply is inserting > CastToDomain whenever the result of a polymorphic function is deemed to > be of a domain type, whether the base type is array or not. > > The reason I hadn't taken it very seriously is that I don't think it's > actually going to end up being consistent. If we don't do #1 (downcast > polymorphic arguments to a base type), but consider the arguments passed > to the function to be of the domain type, then really we have to expect > the polymorphic function to enforce domain constraints internally; we > cannot fix it with something as localized as having the function call > parser stick a CastToDomain on top. Here's a possibly rather silly > example: > > create function negate(anyelement) returns anyelement as > $$ select - $1 $$ language sql; > > create domain pos as int check (value > 0); > > select negate(42::pos); > > This negate() function will work for any type that has a unary minus > operator. But the result of the unary minus operation cannot sanely be > considered to be of this domain type. While simple, I think that example covers the salient features. The git master behavior is sound: "ERROR: return type mismatch in function declared to return pos DETAIL: Actual return type is integer." If you defined a unary minus operator for the domain type itself, the function implementing that operator would then gain responsibility for preserving any domain constraints. Now, perhaps it's unfortunate that the example can't be easily rewritten to actually work for arbitrary domain inputs (without rewriting it in C). I don't have any particular ideas for improving that. > In this simplified example you > might feel it doesn't matter, since with an external CastToDomain we'd > throw error anyway a moment later, as soon as control comes back from > the function. But what if the function does further operations with the > value, such as passing it to another polymorphic function? The SQL PL understands that the value of -($1::pos) has type integer, as does PL/pgSQL. I'm not seeing any problems offhand. > So really, if you go down this path, you end up concluding that PLs > supporting polymorphic arguments had better be prepared to enforce > domain constraints all the way through, and thus there should be no need > for an external CastToDomain --- what comes back from the function ought > to be checked already. That was my conclusion. I'm not aware of any particular holes in this area, but I won't wager there are none. The code that produces the domain-typed datums is not usually part of the PL implementation, so the PL mostly needs to rigorously track the provenance of its datums. > Unfortunately, even if the PLs do that (SQL > functions might get it right, but I'm not real sure whether plpgsql is > water-tight on this, and I don't trust the other PLs for it at all), > there's no way that built-in polymorphic functions like array_append are > going to. > > So on the whole, #2 looks like an implementation quagmire to me: it's > not clear what to check, or where, or how you know when you're done. Every C function is responsible for returning a datum consistent with its declared return type: we do nothing in particular to ensure that a function has done so. I see domain constraints as a natural extension of that rule. If a C function purports to return a domain-typed datum, perhaps through polymorphism, it's responsible for having checked the datum against that domain. Every PL accepts responsibility for enforcing this on behalf of its functions, much as it enforces general consistency between return types and actual returned datums. A function with a polymorphic argument returning the same polymorphic type is consequently responsible for applying domain checks when the concrete type is a domain. (A function with an anyelement argument that returns anyarray is not a problem, because the return type is never a domain. Likewise for a function with an anyarray argument that returns anyelement.) Functions returning anyarray are at the most risk, because it's actually practical to write a function that fabricates an array of arbitrary type. I can't come up with an analogous example for a scalar type -- they end up delegating the fabrication to type-specific code, like your unary negation example. Again, I think that leaves just array_append, array_prepend, and array_cat actually needing a new check. > I'm not willing to volunteer my own time to make it work that way. > If somebody else who uses domains a lot wants to step up and take > responsibility, go for it. I'm willing the implement the above behavior, if it seems reasonable. nm