Thread: generic builtin functions

generic builtin functions

From
Andrew Dunstan
Date:
I am looking at creating a few generic functions builtin for the enum 
stuff. These would be tied to each enum type as it is created. However, 
they should not really appear in pg_proc initially, as there wouldn't be 
any enum types to tie them to anyway. But I want them to have reserved 
oids and appear in the list of builtins.

So I could hack genbki to exclude them, or I could add some code to 
remove them from pg_proc after the event. Bioth of these some a bit 
hackish. Maybe there's a trick I'm not aware of?

cheers

andrew


Re: generic builtin functions

From
Martijn van Oosterhout
Date:
On Thu, Nov 10, 2005 at 12:02:58PM -0500, Andrew Dunstan wrote:
>
> I am looking at creating a few generic functions builtin for the enum
> stuff. These would be tied to each enum type as it is created. However,
> they should not really appear in pg_proc initially, as there wouldn't be
> any enum types to tie them to anyway. But I want them to have reserved
> oids and appear in the list of builtins.

Why? What's wrong with creating the functions when people use the
module, like every other module in contrib? Is there a reason you need
fixed oids?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: generic builtin functions

From
Andrew Dunstan
Date:

Martijn van Oosterhout wrote:

>On Thu, Nov 10, 2005 at 12:02:58PM -0500, Andrew Dunstan wrote:
>  
>
>>I am looking at creating a few generic functions builtin for the enum 
>>stuff. These would be tied to each enum type as it is created. However, 
>>they should not really appear in pg_proc initially, as there wouldn't be 
>>any enum types to tie them to anyway. But I want them to have reserved 
>>oids and appear in the list of builtins.
>>    
>>
>
>Why? What's wrong with creating the functions when people use the
>module, like every other module in contrib? Is there a reason you need
>fixed oids?
>
>
>  
>

This is not intended for contrib. The whole point of the exercise is to 
have language support, which means either it's builtin or it doesn't 
happen. See my email with a general outline from a few days ago.

cheers

andrew


Re: generic builtin functions

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I am looking at creating a few generic functions builtin for the enum 
> stuff. These would be tied to each enum type as it is created. However, 
> they should not really appear in pg_proc initially, as there wouldn't be 
> any enum types to tie them to anyway. But I want them to have reserved 
> oids and appear in the list of builtins.

This feels wrong to me.  Ways that might work include:

1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define
the generic functions as taking 'anyenum'.

2. Don't try to define the generic operations as true functions, but
make them special syntactic constructs comparable to ROW() or ARRAY[].

I think I like #1 better, but it's hard to be sure when discussing
it in a vacuum.  How about being more specific about what you want
to accomplish?
        regards, tom lane


Re: generic builtin functions

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>I am looking at creating a few generic functions builtin for the enum 
>>stuff. These would be tied to each enum type as it is created. However, 
>>they should not really appear in pg_proc initially, as there wouldn't be 
>>any enum types to tie them to anyway. But I want them to have reserved 
>>oids and appear in the list of builtins.
>>    
>>
>
>This feels wrong to me.  Ways that might work include:
>
>1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define
>the generic functions as taking 'anyenum'.
>
>2. Don't try to define the generic operations as true functions, but
>make them special syntactic constructs comparable to ROW() or ARRAY[].
>
>I think I like #1 better, but it's hard to be sure when discussing
>it in a vacuum.  How about being more specific about what you want
>to accomplish?
>
>
>  
>

Yeah, after a bit more thought I came to the conclusion that it wouldn't 
fly.

What I want to have is some builtin functions that can be used as the 
input/output/cast/etc functions for each enum type. The idea wasn't to 
allow users to overload the functions.

I guess we could invent an anyenum pseudotype without actually exposing 
it via the grammar.

Will keep thinking ...

cheers

andrew


Re: generic builtin functions

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> What I want to have is some builtin functions that can be used as the 
> input/output/cast/etc functions for each enum type.

The hard part of that is going to be figuring out how to get the
information to the functions about which enum type they're being invoked
for.  Output functions in particular are handed little except the data
value itself.

Possibly the internal representation of an enum could be 8 bytes: 4
bytes for type OID and 4 more for value.  No doubt the mysql guys would
rag on us for using too much disk space :-(.  But if you did that then
the generics would just be anyenum and done.

> I guess we could invent an anyenum pseudotype without actually exposing 
> it via the grammar.

Why do you think you need to hide it?
        regards, tom lane


Re: generic builtin functions

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Andrew Dunstan <andrew@dunslane.net> writes:
> > I am looking at creating a few generic functions builtin for the enum 
> > stuff. These would be tied to each enum type as it is created. However, 
> > they should not really appear in pg_proc initially, as there wouldn't be 
> > any enum types to tie them to anyway. But I want them to have reserved 
> > oids and appear in the list of builtins.
> 
> This feels wrong to me.  Ways that might work include:
> 
> 1. Invent a pseudotype 'anyenum' comparable to 'anyarray', and define
> the generic functions as taking 'anyenum'.
> 
> 2. Don't try to define the generic operations as true functions, but
> make them special syntactic constructs comparable to ROW() or ARRAY[].

how about 3. Have the functions take an integer and include a cast from enum
to integer.

I know the tendency has been to want to discourage implicit casts, but I think
this is a good use for them. The whole point of enums is to have syntactic
sugar over integers that let you use nicer syntax but that imposes minimal
additional complexity over simply using integers.

Maybe my conception of enums is different from yours. My conception is
basically that of C enums. Where they're purely a creature of the syntax and
type system. At run-time they don't make any effort to prevent you from
treating them as integers.

-- 
greg



Re: generic builtin functions

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Maybe my conception of enums is different from yours. My conception is
> basically that of C enums. Where they're purely a creature of the syntax and
> type system. At run-time they don't make any effort to prevent you from
> treating them as integers.

Well, C is notorious for its weak notions of type, so I hardly think
that counts as precedent for what we should do in SQL ;-)

I don't mind offering a cast from enum to integer, at all, but I think
it needs to be explicit-only.
        regards, tom lane


Re: generic builtin functions

From
Martijn van Oosterhout
Date:
On Thu, Nov 10, 2005 at 01:15:07PM -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > What I want to have is some builtin functions that can be used as the
> > input/output/cast/etc functions for each enum type.
>
> The hard part of that is going to be figuring out how to get the
> information to the functions about which enum type they're being invoked
> for.  Output functions in particular are handed little except the data
> value itself.

For my taggedtypes module I simply created an output function for each
type, but they all referred to the same C function. The fmgr interface
does allow you to retreive your own OID, which allows you to search the
catalog to determine what type you were called with and/or need to
return. I actually built a little LRU cache for the
function-to-return-type lookup to avoid most of the overhead.

> Possibly the internal representation of an enum could be 8 bytes: 4
> bytes for type OID and 4 more for value.  No doubt the mysql guys would
> rag on us for using too much disk space :-(.  But if you did that then
> the generics would just be anyenum and done.

That's another way, but it is really worth the effort to make another any*
type. For arrays it's worth it because people assume you can make an
array of most things. But enums needs to be explicitly defined and how
many enums are you expecting anyway. Is pg_proc bloat an actual
concern?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: generic builtin functions

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>What I want to have is some builtin functions that can be used as the 
>>input/output/cast/etc functions for each enum type.
>>    
>>
>
>The hard part of that is going to be figuring out how to get the
>information to the functions about which enum type they're being invoked
>for.  Output functions in particular are handed little except the data
>value itself.
>
>Possibly the internal representation of an enum could be 8 bytes: 4
>bytes for type OID and 4 more for value.  No doubt the mysql guys would
>rag on us for using too much disk space :-(.  But if you did that then
>the generics would just be anyenum and done.
>  
>


Eek! I would be prepared to go to quite a lot of trouble to avoid that.

My idea was to have the functions that need access to the text values 
look up fcinfo->flinfo->fn_oid and then use that to look up the type 
info. But that would mean we would need pg_proc entries for these 
functions for each enum, even if it's the same function underneath, 
wouldn't it?


>>I guess we could invent an anyenum pseudotype without actually exposing 
>>it via the grammar.
>>    
>>
>
>Why do you think you need to hide it?
>
>    
>  
>

Just desire not to clutter needlessly.

cheers

andrew


Re: generic builtin functions

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> My idea was to have the functions that need access to the text values 
> look up fcinfo->flinfo->fn_oid and then use that to look up the type 
> info. But that would mean we would need pg_proc entries for these 
> functions for each enum, even if it's the same function underneath, 
> wouldn't it?

Yeah, and you still have to have a pg_proc entry for the original
underlying function, else it doesn't get into the builtins list.

It's worth pointing out also that while aliasing a builtin function
after-the-fact like that is possible, lookup for it is substantially
slower than a normal builtin (because we can't do a binary search on
OID for it).  That's on top of the function-to-type-oid lookup you'll
have to do within the function.

I'm not convinced that using bigint-equivalent space for an enum is a
mortal sin...
        regards, tom lane


Re: generic builtin functions

From
Martijn van Oosterhout
Date:
On Thu, Nov 10, 2005 at 03:28:55PM -0500, Andrew Dunstan wrote:
> Eek! I would be prepared to go to quite a lot of trouble to avoid that.
>
> My idea was to have the functions that need access to the text values
> look up fcinfo->flinfo->fn_oid and then use that to look up the type
> info. But that would mean we would need pg_proc entries for these
> functions for each enum, even if it's the same function underneath,
> wouldn't it?

There are functions in the backend already to help you:
  argoid = procLookupArgType( fcinfo->flinfo->fn_oid, 0 );

returns the OID of the type of your first arguments.
  returnoid = procLookupRettype( fcinfo->flinfo->fn_oid );

returns your return type. These work even if you are in a type
input/output function.

Here is some code that uses these:

http://svana.org/kleptog/pgsql/taggedtypes.html

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: generic builtin functions

From
Andrew Dunstan
Date:

Greg Stark wrote:

>I know the tendency has been to want to discourage implicit casts, but I think
>this is a good use for them. The whole point of enums is to have syntactic
>sugar over integers that let you use nicer syntax but that imposes minimal
>additional complexity over simply using integers.
>
>Maybe my conception of enums is different from yours. My conception is
>basically that of C enums. Where they're purely a creature of the syntax and
>type system. At run-time they don't make any effort to prevent you from
>treating them as integers.
>
>  
>

Well, for one thing, I have no plan to allow explicit setting of the 
internal representational value, as one can do in C. And the fact that 
it's an int underneath is in implementation detail, IMNSHO. After all, 
KL just advised using a text domain with a check constraint for enums, 
so int storage is hardly a fundamental part of enum-ness.

Maybe this all just reflects my background in languages that are more 
strongly typed than C and have first class enums.

cheers

andrew






Re: generic builtin functions

From
Greg Stark
Date:
Andrew Dunstan <andrew@dunslane.net> writes:

> Well, for one thing, I have no plan to allow explicit setting of the internal
> representational value, as one can do in C. And the fact that it's an int
> underneath is in implementation detail, IMNSHO. After all, KL just advised
> using a text domain with a check constraint for enums, so int storage is
> hardly a fundamental part of enum-ness.

Well it is in that there's not much point to them if it's not. That is, you
can _already_ use a text domain with check constraints if you want. The only
point to enums is to let you get the syntax niceness that provides without
burdening the implementation with any costs.

That is, the whole point of enums is to let you have your cake and eat it to.
You get to give the programmers a nice safe interface but tell your DBA you're
storing the most space efficient storage format possible.

If you don't get that then you may as well use integers or text strings as you
prefer.

> Maybe this all just reflects my background in languages that are more strongly
> typed than C and have first class enums.

I suspect this is a matter of perspective. If you speak to the programmers
they're liable to agree with you that these languages give this abstract enum
thing that could just as easily be stored as strings. But if you speak to the
language designers they'll tell you that the whole point was to package up an
integer-backed storage in an abstract way and if you implemented them as text
there wouldn't have been any point in having them in the language.

Even languages like lisp treat symbols as integers internally. The whole point
of having symbols is to give an abstraction that programmers can use to hide
the internally grungy details that allow reasonably efficient implementations.
symbols in lisp can be stored and compared efficiently because they're
interned and can be treated as integers. If they were stored as strings there
would be no point in having them.

-- 
greg



Re: generic builtin functions

From
Andrew Dunstan
Date:

Tom Lane wrote:

>I'm not convinced that using bigint-equivalent space for an enum is a
>mortal sin...
>
>    
>  
>

at least venial ...

cheers

andrew


Re: generic builtin functions

From
David Fetter
Date:
On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> 
> >I'm not convinced that using bigint-equivalent space for an enum is
> >a mortal sin...
> 
> at least venial ...

Heh.

Would ORDER BY somehow know about enums' given ordering?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: generic builtin functions

From
Andrew Dunstan
Date:

David Fetter wrote:

>On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote:
>  
>
>>Tom Lane wrote:
>>
>>    
>>
>>>I'm not convinced that using bigint-equivalent space for an enum is
>>>a mortal sin...
>>>      
>>>
>>at least venial ...
>>    
>>
>
>Heh.
>
>Would ORDER BY somehow know about enums' given ordering?
>
>
>  
>

ORDER BY (and all inequality operators) will reflect the defined 
enumeration ordering, as happens today with enumkit-defined types.

That is a fundamental requirement that I won't deviate from.

cheers

andrew


Re: generic builtin functions

From
David Fetter
Date:
On Thu, Nov 10, 2005 at 05:26:45PM -0500, Andrew Dunstan wrote:
> 
> 
> David Fetter wrote:
> 
> >On Thu, Nov 10, 2005 at 04:08:29PM -0500, Andrew Dunstan wrote:
> > 
> >
> >>Tom Lane wrote:
> >>
> >>>I'm not convinced that using bigint-equivalent space for an enum is
> >>>a mortal sin...
> >>>
> >>at least venial ...
> >
> >Heh.
> >
> >Would ORDER BY somehow know about enums' given ordering?
> 
> ORDER BY (and all inequality operators) will reflect the defined
> enumeration ordering, as happens today with enumkit-defined types.
> 
> That is a fundamental requirement that I won't deviate from.

Great :) :)

I hadn't understood how the enumkit stuff worked.

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: generic builtin functions

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> My idea was to have the functions that need access to the text values
>> look up fcinfo->flinfo->fn_oid and then use that to look up the type
>> info. But that would mean we would need pg_proc entries for these
>> functions for each enum, even if it's the same function underneath,
>> wouldn't it?
>
> Yeah, and you still have to have a pg_proc entry for the original
> underlying function, else it doesn't get into the builtins list.
>
> It's worth pointing out also that while aliasing a builtin function
> after-the-fact like that is possible, lookup for it is substantially
> slower than a normal builtin (because we can't do a binary search on
> OID for it).  That's on top of the function-to-type-oid lookup you'll
> have to do within the function.
>
> I'm not convinced that using bigint-equivalent space for an enum is a
> mortal sin...
>

What about having the calling code fill in the io type oid in an extra field
in the flinfo? That possibly still leaves the builtin/aliasing issue ...
that deserves more thought. There's no rush on this.

cheers

andrew





Re: generic builtin functions

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> What about having the calling code fill in the io type oid in an extra field
> in the flinfo?

I don't think that's workable; for one thing there's the problem of
manual invocation of the I/O functions, which is not going to provide
any such special hack.  It also turns the enum proposal into a seriously
invasive patch (hitting all PLs both inside and outside the core, for
instance), at which point you'll start encountering some significant
push-back.

BTW, you might want to think about what'd be involved in supporting
arrays and domains over enums ...
        regards, tom lane


Re: generic builtin functions

From
Andrew Dunstan
Date:

Tom Lane wrote:

>"Andrew Dunstan" <andrew@dunslane.net> writes:
>  
>
>>What about having the calling code fill in the io type oid in an extra field
>>in the flinfo?
>>    
>>
>
>I don't think that's workable; for one thing there's the problem of
>manual invocation of the I/O functions, which is not going to provide
>any such special hack.  It also turns the enum proposal into a seriously
>invasive patch (hitting all PLs both inside and outside the core, for
>instance), at which point you'll start encountering some significant
>push-back.
>  
>

Darn. I see that. Stuff like:
           tmp = DatumGetCString(FunctionCall1(&(desc->arg_out_func[i]),
fcinfo->arg[i]));

At this stage I am probably going to go with your 64bit proposal, on the 
ground that it will permit some progress, and in the possibly vain hope 
that someone will have a flash of insight that will let us do it less 
redundantly in future.

>BTW, you might want to think about what'd be involved in supporting
>arrays and domains over enums ...
>
>            
>
>  
>

Yeah. on my list.

cheers

andrew


Re: generic builtin functions

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Still thinking a bit more about this ... how about we have output 
> functions take an optional second argument, which is the type oid?

No.  We just undid that for good and sufficient security reasons.
If an output function depends on anything more than the contents of
the object it's handed, it's vulnerable to being lied to.
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php

I realize that being told the wrong type ID might be less than
catastrophic for enum_out, but I'm going to fiercely resist putting back
any extra arguments for output functions.  The temptation to use them
unsafely is just too strong --- we've learned that the hard way more
than once already, and I don't want to repeat the same mistake yet again.

> Input funcs get a typioparam and typmod argument in addition to the
> data value,

Entirely different situation because the only thing an input function
assumes is that it's been handed some text ... which it can validate.
        regards, tom lane


Re: generic builtin functions

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Still thinking a bit more about this ... how about we have output 
>>functions take an optional second argument, which is the type oid?
>>    
>>
>
>No.  We just undid that for good and sufficient security reasons.
>If an output function depends on anything more than the contents of
>the object it's handed, it's vulnerable to being lied to.
>http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php
>
>I realize that being told the wrong type ID might be less than
>catastrophic for enum_out, but I'm going to fiercely resist putting back
>any extra arguments for output functions.  The temptation to use them
>unsafely is just too strong --- we've learned that the hard way more
>than once already, and I don't want to repeat the same mistake yet again.
>
>  
>
>>Input funcs get a typioparam and typmod argument in addition to the
>>data value,
>>    
>>
>
>Entirely different situation because the only thing an input function
>assumes is that it's been handed some text ... which it can validate.
>  
>

OK, If you resist fiercely I'm sure it won't happen, so I'll put away 
the asbestos underpants ;-)

I see in that discussion you say:

>Every rowtype Datum will carry its own concrete type.
>

Are we storing that on disk in every composite object? If not, where do 
we get it from before calling the output function?

cheers

andrew


>            regards, tom lane
>
>  
>


Re: generic builtin functions

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I see in that discussion you say:

>> Every rowtype Datum will carry its own concrete type.

> Are we storing that on disk in every composite object?

Yup.  For a rowtype datum it's not a serious overhead.  I realize
that the same is not true of enums :-(
        regards, tom lane