Thread: enum types and binary queries

enum types and binary queries

From
"Merlin Moncure"
Date:
I noticed that enums are not available to be queried as binary through
the protocol.  Is this a known issue?  Too late to fix for 8.3?  This
is kind of a pain, because it forces any query that returns an enum to
return the entire result as text.  afaik, enums are the only POD type
to behave this way.

postgres=# create type foo as enum('foo');
CREATE TYPE

postgres=# copy (select 'foo'::foo) to '/home/postgres/foo.txt' binary;
ERROR:  no binary output function available for type foo

merlin


Re: enum types and binary queries

From
Andrew Dunstan
Date:

Merlin Moncure wrote:
> I noticed that enums are not available to be queried as binary through
> the protocol.  Is this a known issue?  Too late to fix for 8.3?  This
> is kind of a pain, because it forces any query that returns an enum to
> return the entire result as text.  afaik, enums are the only POD type
> to behave this way.
>
> postgres=# create type foo as enum('foo');
> CREATE TYPE
>
> postgres=# copy (select 'foo'::foo) to '/home/postgres/foo.txt' binary;
> ERROR:  no binary output function available for type foo
>   


The trouble is that an enum doesn't have an immutable internal binary value.

I guess we could say that the binary value is the integer offset of the 
value in the enum ordering, and translate it back on input. Providing 
the binary IO funcs shouldn't be too hard if we do that, unless I'm 
missing something.

cheers

andrew




Re: enum types and binary queries

From
Tom Lane
Date:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> I noticed that enums are not available to be queried as binary through
> the protocol.

What do you think the binary representation ought to be?  Copying OIDs
seems pretty useless.
        regards, tom lane


Re: enum types and binary queries

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Merlin Moncure wrote:
>> I noticed that enums are not available to be queried as binary through
>> the protocol.

> The trouble is that an enum doesn't have an immutable internal binary value.

> I guess we could say that the binary value is the integer offset of the 
> value in the enum ordering, and translate it back on input. Providing 
> the binary IO funcs shouldn't be too hard if we do that, unless I'm 
> missing something.

That would be expensive to get, wouldn't it?  The main point of binary
I/O is to be fast, so I'm not excited about an expensive conversion.

What if we just defined the binary format as being identical to the text
format, ie, the enum label?  A bit silly but at least it would eliminate
the gotcha that binary I/O fails.
        regards, tom lane


Re: enum types and binary queries

From
"korry.douglas"
Date:
> What if we just defined the binary format as being identical to the text
> format, ie, the enum label?  A bit silly but at least it would eliminate
> the gotcha that binary I/O fails.
>   
Seems like it would make a lot more sense to the client application that 
way too...
            -- Korry


Re: enum types and binary queries

From
Andrew Dunstan
Date:

Tom Lane wrote:
> What if we just defined the binary format as being identical to the text
> format, ie, the enum label?  A bit silly but at least it would eliminate
> the gotcha that binary I/O fails.
>
>     
>   

So we should pretty much mimic text_send/text_recv? If so I can probably 
get that done fairly quickly.

cheers

andrew


Re: enum types and binary queries

From
"Merlin Moncure"
Date:
On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > I noticed that enums are not available to be queried as binary through
> > the protocol.
>
> What do you think the binary representation ought to be?  Copying OIDs
> seems pretty useless.

I'm on the fence on this one.

I actually think this would be ok, if you mean pg_enum.oid, or the
string would be fine too.  I agree that binary protocol is supposed to
be fast, and I can prefetch the pg_enum table to the client and do the
magic there.  Many other binary formats do similarly inscrutable
things.

One other very small observation: afaict, there appears to be no way
to list enum contents in psql (although you can list the enums
themselves in the type listing).  Maybe this should be possible?  I'm
willing to take a stab at these things if Andrew is busy.

merlin


Re: enum types and binary queries

From
Andrew Dunstan
Date:

Merlin Moncure wrote:
>
> One other very small observation: afaict, there appears to be no way
> to list enum contents in psql (although you can list the enums
> themselves in the type listing). 

select enum_range(null::myenumtype);

> I'm willing to take a stab at these things if Andrew is busy.
>
>
>   

I should have a cut of binary I/O done today, which I will send to you 
for testing and TomD for comment. We need to get it in by beta as it's a 
catalog change.

cheers

andrew


Re: enum types and binary queries

From
Tom Lane
Date:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What do you think the binary representation ought to be?  Copying OIDs
>> seems pretty useless.

> I actually think this would be ok, if you mean pg_enum.oid, or the
> string would be fine too.  I agree that binary protocol is supposed to
> be fast, and I can prefetch the pg_enum table to the client and do the
> magic there.  Many other binary formats do similarly inscrutable
> things.

Well, inscrutable is one thing and unportable is another.  It's supposed
to be possible to reload binary COPY data into a fresh database --- with
maybe some restrictions on the architecture being similar, for the more
machine-specific datatypes such as float.  If we emit raw OIDs then this
will never work, since the same type definition made in a fresh database
would have the same OIDs only by awe-inspiring coincidence.

Andrew's idea of using the enum ordinal value would meet that test, but
at least with the current layout of pg_enum it would be quite expensive
to do the conversion in either direction --- you'd have to fetch
multiple catalog rows.  I think we'd have to add another column showing
the ordinal value, and put an index on it, to make I/O reasonably fast.
Doesn't really seem worth it.
        regards, tom lane


Re: enum types and binary queries

From
Gregory Stark
Date:
"Merlin Moncure" <mmoncure@gmail.com> writes:

> On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> > I noticed that enums are not available to be queried as binary through
>> > the protocol.
>>
>> What do you think the binary representation ought to be?  Copying OIDs
>> seems pretty useless.
>
> I'm on the fence on this one.
>
> I actually think this would be ok, if you mean pg_enum.oid, or the
> string would be fine too.  I agree that binary protocol is supposed to
> be fast, and I can prefetch the pg_enum table to the client and do the
> magic there.  Many other binary formats do similarly inscrutable
> things.

I think it would be ok only if a pg_dump/pg_restore reliably restored the same
oid->enum value mapping. Otherwise a binary dump is useless. But as I
understand it that's the case currently, is it?

You would also have to have some way for a client to simply look up the enum
mapping. Something like an SRF that returned setof(oid,name).

My first instinct was to just use the enum name but I'm starting to think this
would be better. It is more in line with the promise of enums which is that
the names are just labels and the data internally is fixed size.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: enum types and binary queries

From
Alvaro Herrera
Date:
Gregory Stark escribió:

> I think it would be ok only if a pg_dump/pg_restore reliably restored the same
> oid->enum value mapping. Otherwise a binary dump is useless. But as I
> understand it that's the case currently, is it?

That doesn't work if the dump is restored on a database that already has
those OIDs used for another enum.  The easy workaround would be to
"rewrite" the data to use the new OID, but believe me, you don't want to
go down that route, lest madness await you at the end.

-- 
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"


Re: enum types and binary queries

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> Gregory Stark escribió:
>
>> I think it would be ok only if a pg_dump/pg_restore reliably restored the same
>> oid->enum value mapping. Otherwise a binary dump is useless. But as I
>> understand it that's the case currently, is it?

er, lost a "not" in the editing of that, sorry.

> That doesn't work if the dump is restored on a database that already has
> those OIDs used for another enum.  The easy workaround would be to
> "rewrite" the data to use the new OID, but believe me, you don't want to
> go down that route, lest madness await you at the end.

enum OIDs are unique across enums? This seems like a strange way to do it. I
recall conversations about this a while back though and there were limitations
of the type system that led to this, right?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: enum types and binary queries

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> enum OIDs are unique across enums? This seems like a strange way to do it.

That decision was already made, we are not revisiting it (at least not
for 8.3).
        regards, tom lane


Re: enum types and binary queries

From
Andrew Dunstan
Date:

Gregory Stark wrote:
> enum OIDs are unique across enums? This seems like a strange way to do it. I
> recall conversations about this a while back though and there were limitations
> of the type system that led to this, right?
>   

No, not the type system as such. It stems from this quote from Tom:

> If an output function depends on anything more than the contents of
> the object it's handed, it's vulnerable to being lied to.
> http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php

So the value passed to the enum_out function has to be sufficiently 
unique to be able to look up the label. This arrangement got the best 
combination of compactness and simplicity that we could come up with at 
the time.

Incidentally, FWIW, I have heard tales of considerable speedup from 
people being able to avoid using FKs/lookup tables by using enums.

cheers

andrew



Re: enum types and binary queries

From
Decibel!
Date:
On Fri, Aug 31, 2007 at 08:49:05AM -0400, Merlin Moncure wrote:
> On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Merlin Moncure" <mmoncure@gmail.com> writes:
> > > I noticed that enums are not available to be queried as binary through
> > > the protocol.
> >
> > What do you think the binary representation ought to be?  Copying OIDs
> > seems pretty useless.
>
> I'm on the fence on this one.
>
> I actually think this would be ok, if you mean pg_enum.oid, or the
> string would be fine too.  I agree that binary protocol is supposed to
> be fast, and I can prefetch the pg_enum table to the client and do the
> magic there.  Many other binary formats do similarly inscrutable
> things.

The last time I worked on a project where we had C code access the
database, we added stuff to map C enums to ints in the database (along
with a parent table to store the enum label). ISTM it'd be good if we
had a way to get the numeric index out of an enum. I also like the idea
of having a fixed ordering to the labels in an enum.

> One other very small observation: afaict, there appears to be no way
> to list enum contents in psql (although you can list the enums
> themselves in the type listing).  Maybe this should be possible?  I'm
> willing to take a stab at these things if Andrew is busy.

Is there an SRF that will return this info? ISTM you should be able to
get the labels programmatically as well as via psql.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: enum types and binary queries

From
Andrew Dunstan
Date:

Decibel! wrote:
>
> The last time I worked on a project where we had C code access the
> database, we added stuff to map C enums to ints in the database (along
> with a parent table to store the enum label). ISTM it'd be good if we
> had a way to get the numeric index out of an enum. 

If you mean here that you want to find the position in the ordering of 
an enum value, it would be trivial to write, searching in the results 
from enum_range().

Next cycle it might be worth adding a column to pg_enum with the 
explicit order. My serious worry, though, is that it might lead people 
to think they could alter that column and thereby change the ordering, 
which of course they can't. (Quite apart from anything else, a mutable 
ordering would play havoc with enums used in indexes.)

The technique of using a lookup table that you seem to refer to doesn't 
need any special support from the catalogs or the type system. It's used 
today in millions of applications. But it can have quite a high cost in 
extra joins required to get the labels and extra application complexity. 
For a case where the values in the domain of labels are truly fixed, 
enums offer a much more performant and much simpler way to go.

> I also like the idea
> of having a fixed ordering to the labels in an enum.
>   

I do not understand what this sentence means. The ordering *is* fixed - 
it is defined by the order in which the labels are given in the create 
type statement. And the oids that are assigned to the values in pg_enum 
are sorted before being assigned to the labels precisely so that they 
reflect this ordering. So rest assured that a given enum type will have 
a fixed ordering, and it will be consistent across a dump/restore. What 
will not necessarily be consistent is the actual oids used, making the 
oids unsuitable for use in binary output as noted upthread.
>   
>> One other very small observation: afaict, there appears to be no way
>> to list enum contents in psql (although you can list the enums
>> themselves in the type listing).  Maybe this should be possible?  I'm
>> willing to take a stab at these things if Andrew is busy.
>>     
>
> Is there an SRF that will return this info? ISTM you should be able to
> get the labels programmatically as well as via psql.
>   

Maybe you need to read 
http://developer.postgresql.org/pgdocs/postgres/functions-enum.html to 
see info we have made available. We fully expect this list of functions 
to grow as we discover how enums are used in practice.

cheers

andrew


Re: enum types and binary queries

From
Decibel!
Date:
On Fri, Aug 31, 2007 at 01:41:47PM -0400, Andrew Dunstan wrote:
> The technique of using a lookup table that you seem to refer to doesn't
> need any special support from the catalogs or the type system. It's used
> today in millions of applications. But it can have quite a high cost in
> extra joins required to get the labels and extra application complexity.
> For a case where the values in the domain of labels are truly fixed,
> enums offer a much more performant and much simpler way to go.
AIUI, in C code it's easiest to deal with the int value that a C enum
gets, rather than dealing with a label coming back from the database. I
know that's what we did where I worked; the enum column stored the
corresponding C int, and that's what was used as the PK in the lookup
table.

ISTM it'd be good if we could do the same with our enums (pass the int
value back instead of a label).

> >I also like the idea
> >of having a fixed ordering to the labels in an enum.
> >
>
> I do not understand what this sentence means. The ordering *is* fixed -
> it is defined by the order in which the labels are given in the create
> type statement. And the oids that are assigned to the values in pg_enum
> are sorted before being assigned to the labels precisely so that they
> reflect this ordering. So rest assured that a given enum type will have
> a fixed ordering, and it will be consistent across a dump/restore. What
> will not necessarily be consistent is the actual oids used, making the
> oids unsuitable for use in binary output as noted upthread.

What if the OID counter wraps in the middle of adding the labels? (IE:
create a 4 label ENUM when the OID counter is 1 number away from
wrapping).

If we ever add support for adding additional labels to enums this could
be an issue too.

> Maybe you need to read
> http://developer.postgresql.org/pgdocs/postgres/functions-enum.html to
> see info we have made available. We fully expect this list of functions
> to grow as we discover how enums are used in practice.

Looks good... should still be a psql command, imo. Perhaps as part of
\dT...
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: enum types and binary queries

From
Andrew Dunstan
Date:

Decibel! wrote:
> On Fri, Aug 31, 2007 at 01:41:47PM -0400, Andrew Dunstan wrote:
>   
>> The technique of using a lookup table that you seem to refer to doesn't 
>> need any special support from the catalogs or the type system. It's used 
>> today in millions of applications. But it can have quite a high cost in 
>> extra joins required to get the labels and extra application complexity. 
>> For a case where the values in the domain of labels are truly fixed, 
>> enums offer a much more performant and much simpler way to go.
>>     
>  
> AIUI, in C code it's easiest to deal with the int value that a C enum
> gets, rather than dealing with a label coming back from the database. I
> know that's what we did where I worked; the enum column stored the
> corresponding C int, and that's what was used as the PK in the lookup
> table.
>
> ISTM it'd be good if we could do the same with our enums (pass the int
> value back instead of a label).
>   


Jim, you and I have had this discussion before. The answer is the same 
as the last time you asked about this: because it breaks the abstraction.

If the C code doesn't need the label then why store it at all? You can 
just make the database field an int.

If you want to get the ordering offset of a particular enum value you 
can write a function in about 10 lines of C that will give it to you.


>   
>>> I also like the idea
>>> of having a fixed ordering to the labels in an enum.
>>>  
>>>       
>> I do not understand what this sentence means. The ordering *is* fixed - 
>> it is defined by the order in which the labels are given in the create 
>> type statement. And the oids that are assigned to the values in pg_enum 
>> are sorted before being assigned to the labels precisely so that they 
>> reflect this ordering. So rest assured that a given enum type will have 
>> a fixed ordering, and it will be consistent across a dump/restore. What 
>> will not necessarily be consistent is the actual oids used, making the 
>> oids unsuitable for use in binary output as noted upthread.
>>     
>
> What if the OID counter wraps in the middle of adding the labels? (IE:
> create a 4 label ENUM when the OID counter is 1 number away from
> wrapping).
>   

It will not be a problem. I have just explained that we sort them first. 
This is a furfy that has been raised before and explained before. See 
pg_enum.c starting around line 52. In particular:
         /* sort them, just in case counter wrapped from high to low */         qsort(oids, n, sizeof(Oid), oid_cmp);


> If we ever add support for adding additional labels to enums this could
> be an issue too.
>   

I doubt we will be doing it. You can get the effect by defining a new 
type and using the old labels.

>   
>> Maybe you need to read 
>> http://developer.postgresql.org/pgdocs/postgres/functions-enum.html to 
>> see info we have made available. We fully expect this list of functions 
>> to grow as we discover how enums are used in practice.
>>     
>
> Looks good... should still be a psql command, imo. Perhaps as part of
> \dT...
>   

We'd have to special case enums, or provide a special \d command to 
handle them. Not sure either is worth it when we have the functions anyway.

cheers

andrew



Re: enum types and binary queries

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
> This is a furfy that has been raised before and explained before.

Of course, as usual I misspelled it, the word is "furphy". I didn't 
realise that it was an Australianism. It means more or less "a red 
herring". Wikipedia says that it is a term particularly popular in 
Australian politics, and as some people know my late father was a noted 
practitioner of that art ;-)

cheers

andrew