RFC for adding typmods to functions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | RFC for adding typmods to functions |
Date | |
Msg-id | 5192.1258495767@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: RFC for adding typmods to functions
Re: RFC for adding typmods to functions Re: RFC for adding typmods to functions |
List | pgsql-hackers |
Pavel submitted a patch to add typmods to function declarations, but there was no prior design discussion and it desperately needs some. Let me try to summarize the issues that seem to need agreement. The proposed patch allows optional typmods to be attached to the declared argument and result types of a function; for example you could say "create function foo(numeric(2)) returns numeric(4)". (Note: in existing releases, this syntax works but the typmod information is simply discarded.) An immediate application, not implemented here but which we'd like to have for 8.5, is multiple anyelement types -- for example, create function foo(anyelement, anyelement, anyelement(1), anyelement(1)) returns anyelement(1) says that the first and second arguments must be of the same type, the third and fourth must also be of the same type but not necessarily the same as the first two, and the result is of this second type. I can see the following definitional issues: 1. Are the typmods of input arguments part of the function signature, ie, could foo(numeric(2)) and foo(numeric(3)) coexist? The proposed patch answers "no, they are the same function and you can have only one". This may be good enough, but there are some possible uses that we are foreclosing by doing this. Two sample applications: foo(numeric) a general-purpose function foo(numeric(2)) same definition but optimized for short inputs foo(anyelement, anyelement(1)) general case foo(anyelement, anyelement) optimized for identical input types The major obstacle to allowing such cases is that we'd need to invent new ambiguous-function resolution rules that would let us figure out which function to prefer for a given set of inputs, and it's not at all clear how to do that --- in particular deciding that one is preferable to another seems to require type-specific knowledge about the meaning of different typmods. So that looks like a major can of worms, probably requiring new APIs for custom data types. A possible compromise is to say that you can have only one now but leave the door open to allow more than one later. However, the function signature is the function identity for many purposes, so it's hard to be fuzzy about this. For example, given "CREATE FUNCTION foo(numeric(2))", which of the following should drop the function?DROP FUNCTION foo(numeric(2));DROP FUNCTION foo(numeric);DROP FUNCTION foo(numeric(3)); The traditional behavior is that any of these would work, since the typmod was ignored anyway. If the typmod means something then the second one is a bit surprising and the third definitely doesn't satisfy the POLA. Are we prepared to possibly break existing apps now by disallowing the third and/or second? 2. What is the exact meaning of attaching a typmod to an input argument? As the patch has it, doing so means nothing at all for the purposes of resolving which function to call, and then once we have identified the function we will attempt to apply an implicit coercion to the actual input argument to make it match the typmod. The first part of that is probably reasonable if you accept the "there can be only one" answer to point #1; but if you don't then it's completely unworkable. In any case it's worth noting that foo(anyelement, anyelement) will accept two arguments of the same types and different typmods, which might surprise people. The second part is trickier, in particular the fact that the coercion is implicit. Up to now there have been only assignment and explicit coercions that could try to apply a typmod to a value. Our existing API for coercion functions (see the CREATE CAST man page if you don't recall details) doesn't even provide a way for the coercion function to distinguish implicit from assignment coercions. Maybe this is fine --- on that same page we say it's bad design for coercion functions to pay attention to the cast context anyhow. But we had better agree that it's okay for such coercions to behave more like assignment than like a traditional implicit cast. If you want to distinguish the cases, we need to break that API. 3. What is the exact meaning of attaching a typmod to a result or output argument? There are two fundamentally different views you can take on this point: that the typmod is an assertion that the function result matches the typmod, or that the typmod requests a run-time coercion step to make the result match the typmod. For C-level functions the first of these seems more natural; after all we take it on faith that the result is of the declared type. In particular, you *have to* adopt that viewpoint towards the coercion functions of the type, because the system has no other knowledge of what a typmod means than "the results of the type's coercion functions have the correct properties for the given typmod value". For PL functions I doubt we want to trust the function writer completely that his results match the typmod, but should we adopt an approach of "check the result" (and, presumably, throw error if it doesn't meet the typmod) or "force a coercion" (and if so, with which semantics --- explicit, assignment, implicit)? The former would require infrastructure we have not currently got, ie, a "check typmod" function for datatypes supporting typmods. The latter seems a bit ugly because it gives PL functions a subtly different set of semantics from C functions. In either case it seems we'd have to hope that all PL authors remember to insert code to do that, or else we have a hole in the type system: functions returning values that don't meet the typmod the system thinks they do. We can fix all the built-in PLs but I'll gladly wager that at least one third-party PL will forget to deal with this, and nobody will notice until it's reported as a security bug. 4. What about functions whose output typmod should depend on the input typmod(s)? I mentioned earlier the example that concatenation of varchar(M) and varchar(N) should produce varchar(M+N). We could possibly punt on this for the time being; supporting only fixed output typmods for now doesn't obviously foreclose us from adding support for computed typmods later. However there is still one nasty case that we cannot push off till later: given a function that takes and returns a polymorphic type such as anyelement, and an actual argument with a typmod (eg numeric(2)), is the result numeric(2) or just numeric? As things stand we would have little choice but to say the latter, because we don't know what the function might do with the value, and there are too many real cases where the result might not have the same typmod. But there are also a lot of cases where you *would* wish that it has the same typmod, and this patch raises the stakes for throwing away typmods mid-expression. Is this okay, and if not what could we do about it? Unless we have consensus on all of these points I don't think we should proceed with the patch. Comments? regards, tom lane
pgsql-hackers by date: