Thread: generic builtin functions
I am looking at creating a few generic functions builtin for the enum stuff. These would be tied to each enum type as it is created. However, they should not really appear in pg_proc initially, as there wouldn't be any enum types to tie them to anyway. But I want them to have reserved oids and appear in the list of builtins. So I could hack genbki to exclude them, or I could add some code to remove them from pg_proc after the event. Bioth of these some a bit hackish. Maybe there's a trick I'm not aware of? cheers andrew
On Thu, Nov 10, 2005 at 12:02:58PM -0500, Andrew Dunstan wrote: > > I am looking at creating a few generic functions builtin for the enum > stuff. These would be tied to each enum type as it is created. However, > they should not really appear in pg_proc initially, as there wouldn't be > any enum types to tie them to anyway. But I want them to have reserved > oids and appear in the list of builtins. Why? What's wrong with creating the functions when people use the module, like every other module in contrib? Is there a reason you need fixed oids? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: >On Thu, Nov 10, 2005 at 12:02:58PM -0500, Andrew Dunstan wrote: > > >>I am looking at creating a few generic functions builtin for the enum >>stuff. These would be tied to each enum type as it is created. However, >>they should not really appear in pg_proc initially, as there wouldn't be >>any enum types to tie them to anyway. But I want them to have reserved >>oids and appear in the list of builtins. >> >> > >Why? What's wrong with creating the functions when people use the >module, like every other module in contrib? Is there a reason you need >fixed oids? > > > > This is not intended for contrib. The whole point of the exercise is to have language support, which means either it's builtin or it doesn't happen. See my email with a general outline from a few days ago. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I am looking at creating a few generic functions builtin for the enum > stuff. These would be tied to each enum type as it is created. However, > they should not really appear in pg_proc initially, as there wouldn't be > any enum types to tie them to anyway. But I want them to have reserved > oids and appear in the list of builtins. This feels wrong to me. Ways that might work include: 1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define the generic functions as taking 'anyenum'. 2. Don't try to define the generic operations as true functions, but make them special syntactic constructs comparable to ROW() or ARRAY[]. I think I like #1 better, but it's hard to be sure when discussing it in a vacuum. How about being more specific about what you want to accomplish? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>I am looking at creating a few generic functions builtin for the enum >>stuff. These would be tied to each enum type as it is created. However, >>they should not really appear in pg_proc initially, as there wouldn't be >>any enum types to tie them to anyway. But I want them to have reserved >>oids and appear in the list of builtins. >> >> > >This feels wrong to me. Ways that might work include: > >1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define >the generic functions as taking 'anyenum'. > >2. Don't try to define the generic operations as true functions, but >make them special syntactic constructs comparable to ROW() or ARRAY[]. > >I think I like #1 better, but it's hard to be sure when discussing >it in a vacuum. How about being more specific about what you want >to accomplish? > > > > Yeah, after a bit more thought I came to the conclusion that it wouldn't fly. What I want to have is some builtin functions that can be used as the input/output/cast/etc functions for each enum type. The idea wasn't to allow users to overload the functions. I guess we could invent an anyenum pseudotype without actually exposing it via the grammar. Will keep thinking ... cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > What I want to have is some builtin functions that can be used as the > input/output/cast/etc functions for each enum type. The hard part of that is going to be figuring out how to get the information to the functions about which enum type they're being invoked for. Output functions in particular are handed little except the data value itself. Possibly the internal representation of an enum could be 8 bytes: 4 bytes for type OID and 4 more for value. No doubt the mysql guys would rag on us for using too much disk space :-(. But if you did that then the generics would just be anyenum and done. > I guess we could invent an anyenum pseudotype without actually exposing > it via the grammar. Why do you think you need to hide it? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Andrew Dunstan <andrew@dunslane.net> writes: > > I am looking at creating a few generic functions builtin for the enum > > stuff. These would be tied to each enum type as it is created. However, > > they should not really appear in pg_proc initially, as there wouldn't be > > any enum types to tie them to anyway. But I want them to have reserved > > oids and appear in the list of builtins. > > This feels wrong to me. Ways that might work include: > > 1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define > the generic functions as taking 'anyenum'. > > 2. Don't try to define the generic operations as true functions, but > make them special syntactic constructs comparable to ROW() or ARRAY[]. how about 3. Have the functions take an integer and include a cast from enum to integer. I know the tendency has been to want to discourage implicit casts, but I think this is a good use for them. The whole point of enums is to have syntactic sugar over integers that let you use nicer syntax but that imposes minimal additional complexity over simply using integers. Maybe my conception of enums is different from yours. My conception is basically that of C enums. Where they're purely a creature of the syntax and type system. At run-time they don't make any effort to prevent you from treating them as integers. -- greg
Greg Stark <gsstark@mit.edu> writes: > Maybe my conception of enums is different from yours. My conception is > basically that of C enums. Where they're purely a creature of the syntax and > type system. At run-time they don't make any effort to prevent you from > treating them as integers. Well, C is notorious for its weak notions of type, so I hardly think that counts as precedent for what we should do in SQL ;-) I don't mind offering a cast from enum to integer, at all, but I think it needs to be explicit-only. regards, tom lane
On Thu, Nov 10, 2005 at 01:15:07PM -0500, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > What I want to have is some builtin functions that can be used as the > > input/output/cast/etc functions for each enum type. > > The hard part of that is going to be figuring out how to get the > information to the functions about which enum type they're being invoked > for. Output functions in particular are handed little except the data > value itself. For my taggedtypes module I simply created an output function for each type, but they all referred to the same C function. The fmgr interface does allow you to retreive your own OID, which allows you to search the catalog to determine what type you were called with and/or need to return. I actually built a little LRU cache for the function-to-return-type lookup to avoid most of the overhead. > Possibly the internal representation of an enum could be 8 bytes: 4 > bytes for type OID and 4 more for value. No doubt the mysql guys would > rag on us for using too much disk space :-(. But if you did that then > the generics would just be anyenum and done. That's another way, but it is really worth the effort to make another any* type. For arrays it's worth it because people assume you can make an array of most things. But enums needs to be explicitly defined and how many enums are you expecting anyway. Is pg_proc bloat an actual concern? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>What I want to have is some builtin functions that can be used as the >>input/output/cast/etc functions for each enum type. >> >> > >The hard part of that is going to be figuring out how to get the >information to the functions about which enum type they're being invoked >for. Output functions in particular are handed little except the data >value itself. > >Possibly the internal representation of an enum could be 8 bytes: 4 >bytes for type OID and 4 more for value. No doubt the mysql guys would >rag on us for using too much disk space :-(. But if you did that then >the generics would just be anyenum and done. > > Eek! I would be prepared to go to quite a lot of trouble to avoid that. My idea was to have the functions that need access to the text values look up fcinfo->flinfo->fn_oid and then use that to look up the type info. But that would mean we would need pg_proc entries for these functions for each enum, even if it's the same function underneath, wouldn't it? >>I guess we could invent an anyenum pseudotype without actually exposing >>it via the grammar. >> >> > >Why do you think you need to hide it? > > > > Just desire not to clutter needlessly. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > My idea was to have the functions that need access to the text values > look up fcinfo->flinfo->fn_oid and then use that to look up the type > info. But that would mean we would need pg_proc entries for these > functions for each enum, even if it's the same function underneath, > wouldn't it? Yeah, and you still have to have a pg_proc entry for the original underlying function, else it doesn't get into the builtins list. It's worth pointing out also that while aliasing a builtin function after-the-fact like that is possible, lookup for it is substantially slower than a normal builtin (because we can't do a binary search on OID for it). That's on top of the function-to-type-oid lookup you'll have to do within the function. I'm not convinced that using bigint-equivalent space for an enum is a mortal sin... regards, tom lane
On Thu, Nov 10, 2005 at 03:28:55PM -0500, Andrew Dunstan wrote: > Eek! I would be prepared to go to quite a lot of trouble to avoid that. > > My idea was to have the functions that need access to the text values > look up fcinfo->flinfo->fn_oid and then use that to look up the type > info. But that would mean we would need pg_proc entries for these > functions for each enum, even if it's the same function underneath, > wouldn't it? There are functions in the backend already to help you: argoid = procLookupArgType( fcinfo->flinfo->fn_oid, 0 ); returns the OID of the type of your first arguments. returnoid = procLookupRettype( fcinfo->flinfo->fn_oid ); returns your return type. These work even if you are in a type input/output function. Here is some code that uses these: http://svana.org/kleptog/pgsql/taggedtypes.html Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Greg Stark wrote: >I know the tendency has been to want to discourage implicit casts, but I think >this is a good use for them. The whole point of enums is to have syntactic >sugar over integers that let you use nicer syntax but that imposes minimal >additional complexity over simply using integers. > >Maybe my conception of enums is different from yours. My conception is >basically that of C enums. Where they're purely a creature of the syntax and >type system. At run-time they don't make any effort to prevent you from >treating them as integers. > > > Well, for one thing, I have no plan to allow explicit setting of the internal representational value, as one can do in C. And the fact that it's an int underneath is in implementation detail, IMNSHO. After all, KL just advised using a text domain with a check constraint for enums, so int storage is hardly a fundamental part of enum-ness. Maybe this all just reflects my background in languages that are more strongly typed than C and have first class enums. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Well, for one thing, I have no plan to allow explicit setting of the internal > representational value, as one can do in C. And the fact that it's an int > underneath is in implementation detail, IMNSHO. After all, KL just advised > using a text domain with a check constraint for enums, so int storage is > hardly a fundamental part of enum-ness. Well it is in that there's not much point to them if it's not. That is, you can _already_ use a text domain with check constraints if you want. The only point to enums is to let you get the syntax niceness that provides without burdening the implementation with any costs. That is, the whole point of enums is to let you have your cake and eat it to. You get to give the programmers a nice safe interface but tell your DBA you're storing the most space efficient storage format possible. If you don't get that then you may as well use integers or text strings as you prefer. > Maybe this all just reflects my background in languages that are more strongly > typed than C and have first class enums. I suspect this is a matter of perspective. If you speak to the programmers they're liable to agree with you that these languages give this abstract enum thing that could just as easily be stored as strings. But if you speak to the language designers they'll tell you that the whole point was to package up an integer-backed storage in an abstract way and if you implemented them as text there wouldn't have been any point in having them in the language. Even languages like lisp treat symbols as integers internally. The whole point of having symbols is to give an abstraction that programmers can use to hide the internally grungy details that allow reasonably efficient implementations. symbols in lisp can be stored and compared efficiently because they're interned and can be treated as integers. If they were stored as strings there would be no point in having them. -- greg
Tom Lane wrote: >I'm not convinced that using bigint-equivalent space for an enum is a >mortal sin... > > > > at least venial ... cheers andrew
On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote: > > > Tom Lane wrote: > > >I'm not convinced that using bigint-equivalent space for an enum is > >a mortal sin... > > at least venial ... Heh. Would ORDER BY somehow know about enums' given ordering? Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter wrote: >On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote: > > >>Tom Lane wrote: >> >> >> >>>I'm not convinced that using bigint-equivalent space for an enum is >>>a mortal sin... >>> >>> >>at least venial ... >> >> > >Heh. > >Would ORDER BY somehow know about enums' given ordering? > > > > ORDER BY (and all inequality operators) will reflect the defined enumeration ordering, as happens today with enumkit-defined types. That is a fundamental requirement that I won't deviate from. cheers andrew
On Thu, Nov 10, 2005 at 05:26:45PM -0500, Andrew Dunstan wrote: > > > David Fetter wrote: > > >On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote: > > > > > >>Tom Lane wrote: > >> > >>>I'm not convinced that using bigint-equivalent space for an enum is > >>>a mortal sin... > >>> > >>at least venial ... > > > >Heh. > > > >Would ORDER BY somehow know about enums' given ordering? > > ORDER BY (and all inequality operators) will reflect the defined > enumeration ordering, as happens today with enumkit-defined types. > > That is a fundamental requirement that I won't deviate from. Great :) :) I hadn't understood how the enumkit stuff worked. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> My idea was to have the functions that need access to the text values >> look up fcinfo->flinfo->fn_oid and then use that to look up the type >> info. But that would mean we would need pg_proc entries for these >> functions for each enum, even if it's the same function underneath, >> wouldn't it? > > Yeah, and you still have to have a pg_proc entry for the original > underlying function, else it doesn't get into the builtins list. > > It's worth pointing out also that while aliasing a builtin function > after-the-fact like that is possible, lookup for it is substantially > slower than a normal builtin (because we can't do a binary search on > OID for it). That's on top of the function-to-type-oid lookup you'll > have to do within the function. > > I'm not convinced that using bigint-equivalent space for an enum is a > mortal sin... > What about having the calling code fill in the io type oid in an extra field in the flinfo? That possibly still leaves the builtin/aliasing issue ... that deserves more thought. There's no rush on this. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > What about having the calling code fill in the io type oid in an extra field > in the flinfo? I don't think that's workable; for one thing there's the problem of manual invocation of the I/O functions, which is not going to provide any such special hack. It also turns the enum proposal into a seriously invasive patch (hitting all PLs both inside and outside the core, for instance), at which point you'll start encountering some significant push-back. BTW, you might want to think about what'd be involved in supporting arrays and domains over enums ... regards, tom lane
Tom Lane wrote: >"Andrew Dunstan" <andrew@dunslane.net> writes: > > >>What about having the calling code fill in the io type oid in an extra field >>in the flinfo? >> >> > >I don't think that's workable; for one thing there's the problem of >manual invocation of the I/O functions, which is not going to provide >any such special hack. It also turns the enum proposal into a seriously >invasive patch (hitting all PLs both inside and outside the core, for >instance), at which point you'll start encountering some significant >push-back. > > Darn. I see that. Stuff like: tmp = DatumGetCString(FunctionCall1(&(desc->arg_out_func[i]), fcinfo->arg[i])); At this stage I am probably going to go with your 64bit proposal, on the ground that it will permit some progress, and in the possibly vain hope that someone will have a flash of insight that will let us do it less redundantly in future. >BTW, you might want to think about what'd be involved in supporting >arrays and domains over enums ... > > > > > Yeah. on my list. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Still thinking a bit more about this ... how about we have output > functions take an optional second argument, which is the type oid? No. We just undid that for good and sufficient security reasons. If an output function depends on anything more than the contents of the object it's handed, it's vulnerable to being lied to. http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php I realize that being told the wrong type ID might be less than catastrophic for enum_out, but I'm going to fiercely resist putting back any extra arguments for output functions. The temptation to use them unsafely is just too strong --- we've learned that the hard way more than once already, and I don't want to repeat the same mistake yet again. > Input funcs get a typioparam and typmod argument in addition to the > data value, Entirely different situation because the only thing an input function assumes is that it's been handed some text ... which it can validate. regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Still thinking a bit more about this ... how about we have output >>functions take an optional second argument, which is the type oid? >> >> > >No. We just undid that for good and sufficient security reasons. >If an output function depends on anything more than the contents of >the object it's handed, it's vulnerable to being lied to. >http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php > >I realize that being told the wrong type ID might be less than >catastrophic for enum_out, but I'm going to fiercely resist putting back >any extra arguments for output functions. The temptation to use them >unsafely is just too strong --- we've learned that the hard way more >than once already, and I don't want to repeat the same mistake yet again. > > > >>Input funcs get a typioparam and typmod argument in addition to the >>data value, >> >> > >Entirely different situation because the only thing an input function >assumes is that it's been handed some text ... which it can validate. > > OK, If you resist fiercely I'm sure it won't happen, so I'll put away the asbestos underpants ;-) I see in that discussion you say: >Every rowtype Datum will carry its own concrete type. > Are we storing that on disk in every composite object? If not, where do we get it from before calling the output function? cheers andrew > regards, tom lane > > >
Andrew Dunstan <andrew@dunslane.net> writes: > I see in that discussion you say: >> Every rowtype Datum will carry its own concrete type. > Are we storing that on disk in every composite object? Yup. For a rowtype datum it's not a serious overhead. I realize that the same is not true of enums :-( regards, tom lane