Thread: Current enums patch

Current enums patch

From
Tom Dunstan
Date:
Hi all

Here's the current version of the enums patch. Not much change from last
time, the only thought-inducing stuff was fixing up some macros that
changed with the VARLENA changes, and adding a regression test to do
basic checking of RI behavior, after the discussions that we had
recently on the ri_trigger stuff with generic types. The actual behavior
was fixed by Tom's earlier patch, so this is just a sanity check.

Cheers

Tom

Attachment

Re: Current enums patch

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Tom Dunstan wrote:
> Hi all
>
> Here's the current version of the enums patch. Not much change from last
> time, the only thought-inducing stuff was fixing up some macros that
> changed with the VARLENA changes, and adding a regression test to do
> basic checking of RI behavior, after the discussions that we had
> recently on the ri_trigger stuff with generic types. The actual behavior
> was fixed by Tom's earlier patch, so this is just a sanity check.
>
> Cheers
>
> Tom

[ application/x-gzip is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Current enums patch

From
Peter Eisentraut
Date:
Am Dienstag, 27. März 2007 03:36 schrieb Tom Dunstan:
> Here's the current version of the enums patch. Not much change from last
> time, the only thought-inducing stuff was fixing up some macros that
> changed with the VARLENA changes, and adding a regression test to do
> basic checking of RI behavior, after the discussions that we had
> recently on the ri_trigger stuff with generic types. The actual behavior
> was fixed by Tom's earlier patch, so this is just a sanity check.

Your patch doesn't compile anymore.

ccache cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g -I. -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o parse_coerce.o
parse_coerce.c-MMD -MP -MF .deps/parse_coerce.Po 
parse_coerce.c: In function 'can_coerce_type':
parse_coerce.c:460: error: too few arguments to function 'find_coercion_pathway'
parse_coerce.c: In function 'find_coercion_pathway':
parse_coerce.c:1817: error: too few arguments to function 'find_coercion_pathway'
parse_coerce.c:1822: error: too few arguments to function 'find_coercion_pathway'

This was only changed a few days ago, so you need to update your patch.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Current enums patch

From
Andrew Dunstan
Date:
Peter Eisentraut wrote:
> Am Dienstag, 27. März 2007 03:36 schrieb Tom Dunstan:
>
>> Here's the current version of the enums patch. Not much change from last
>> time, the only thought-inducing stuff was fixing up some macros that
>> changed with the VARLENA changes, and adding a regression test to do
>> basic checking of RI behavior, after the discussions that we had
>> recently on the ri_trigger stuff with generic types. The actual behavior
>> was fixed by Tom's earlier patch, so this is just a sanity check.
>>
>
> Your patch doesn't compile anymore.
>
> ccache cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g -I. -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o parse_coerce.o
parse_coerce.c-MMD -MP -MF .deps/parse_coerce.Po 
> parse_coerce.c: In function 'can_coerce_type':
> parse_coerce.c:460: error: too few arguments to function 'find_coercion_pathway'
> parse_coerce.c: In function 'find_coercion_pathway':
> parse_coerce.c:1817: error: too few arguments to function 'find_coercion_pathway'
> parse_coerce.c:1822: error: too few arguments to function 'find_coercion_pathway'
>
> This was only changed a few days ago, so you need to update your patch.
>
>

Peter,

If you want to review or test the feature, the attached patch can be
used as a replacement for the portion that affects parse_coerce.c, and
with this it compiles and passes regression. I think it's correct but it
should still be OKed by at least one Tom. :-)

cheers

andrew
Index: parse_coerce.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/parse_coerce.c,v
retrieving revision 2.152
diff -c -r2.152 parse_coerce.c
*** parse_coerce.c    27 Mar 2007 23:21:10 -0000    2.152
--- parse_coerce.c    31 Mar 2007 18:53:08 -0000
***************
*** 132,137 ****
--- 132,138 ----
      }
      if (targetTypeId == ANYOID ||
          targetTypeId == ANYELEMENTOID ||
+         targetTypeId == ANYENUMOID ||
          (targetTypeId == ANYARRAYOID && inputTypeId != UNKNOWNOID))
      {
          /*
***************
*** 406,414 ****
          if (targetTypeId == ANYOID)
              continue;

!         /* accept if target is ANYARRAY or ANYELEMENT, for now */
          if (targetTypeId == ANYARRAYOID ||
!             targetTypeId == ANYELEMENTOID)
          {
              have_generics = true;        /* do more checking later */
              continue;
--- 407,416 ----
          if (targetTypeId == ANYOID)
              continue;

!         /* accept if target is ANYARRAY, ANYELEMENT or ANYENUM, for now */
          if (targetTypeId == ANYARRAYOID ||
!             targetTypeId == ANYELEMENTOID ||
!             targetTypeId == ANYENUMOID)
          {
              have_generics = true;        /* do more checking later */
              continue;
***************
*** 451,456 ****
--- 453,466 ----
              continue;

          /*
+          * If input is an enum, can ANYENUM be cast to target?
+          */
+         if (is_type_enum(inputTypeId) &&
+             find_coercion_pathway(targetTypeId, ANYENUMOID,
+                                   ccontext, &funcId, &arrayCoerce))
+             continue;
+
+         /*
           * Else, cannot coerce at this argument position
           */
          return false;
***************
*** 1070,1076 ****
      Oid            array_typeid = InvalidOid;
      Oid            array_typelem;
      bool        have_anyelement = false;
!
      /*
       * Loop through the arguments to see if we have any that are ANYARRAY or
       * ANYELEMENT. If so, require the actual types to be self-consistent
--- 1080,1086 ----
      Oid            array_typeid = InvalidOid;
      Oid            array_typelem;
      bool        have_anyelement = false;
!     bool        have_enum = false;
      /*
       * Loop through the arguments to see if we have any that are ANYARRAY or
       * ANYELEMENT. If so, require the actual types to be self-consistent
***************
*** 1079,1085 ****
      {
          Oid            actual_type = actual_arg_types[j];

!         if (declared_arg_types[j] == ANYELEMENTOID)
          {
              have_anyelement = true;
              if (actual_type == UNKNOWNOID)
--- 1089,1096 ----
      {
          Oid            actual_type = actual_arg_types[j];

!         if (declared_arg_types[j] == ANYELEMENTOID ||
!             declared_arg_types[j] == ANYENUMOID)
          {
              have_anyelement = true;
              if (actual_type == UNKNOWNOID)
***************
*** 1087,1092 ****
--- 1098,1105 ----
              if (OidIsValid(elem_typeid) && actual_type != elem_typeid)
                  return false;
              elem_typeid = actual_type;
+             if (declared_arg_types[j] == ANYENUMOID)
+                 have_enum = true;
          }
          else if (declared_arg_types[j] == ANYARRAYOID)
          {
***************
*** 1127,1132 ****
--- 1140,1151 ----
          }
      }

+     if (have_enum)
+     {
+         /* is the given type as enum type? */
+         return is_type_enum(elem_typeid);
+     }
+
      /* Looks valid */
      return true;
  }
***************
*** 1180,1186 ****
      Oid            elem_typeid = InvalidOid;
      Oid            array_typeid = InvalidOid;
      Oid            array_typelem;
!     bool        have_anyelement = (rettype == ANYELEMENTOID);

      /*
       * Loop through the arguments to see if we have any that are ANYARRAY or
--- 1199,1206 ----
      Oid            elem_typeid = InvalidOid;
      Oid            array_typeid = InvalidOid;
      Oid            array_typelem;
!     bool        have_anyelement = (rettype == ANYELEMENTOID ||
!                                    rettype == ANYENUMOID);

      /*
       * Loop through the arguments to see if we have any that are ANYARRAY or
***************
*** 1190,1196 ****
      {
          Oid            actual_type = actual_arg_types[j];

!         if (declared_arg_types[j] == ANYELEMENTOID)
          {
              have_generics = have_anyelement = true;
              if (actual_type == UNKNOWNOID)
--- 1210,1217 ----
      {
          Oid            actual_type = actual_arg_types[j];

!         if (declared_arg_types[j] == ANYELEMENTOID ||
!             declared_arg_types[j] == ANYENUMOID)
          {
              have_generics = have_anyelement = true;
              if (actual_type == UNKNOWNOID)
***************
*** 1289,1295 ****
              if (actual_type != UNKNOWNOID)
                  continue;

!             if (declared_arg_types[j] == ANYELEMENTOID)
                  declared_arg_types[j] = elem_typeid;
              else if (declared_arg_types[j] == ANYARRAYOID)
              {
--- 1310,1317 ----
              if (actual_type != UNKNOWNOID)
                  continue;

!             if (declared_arg_types[j] == ANYELEMENTOID ||
!                 declared_arg_types[j] == ANYENUMOID)
                  declared_arg_types[j] = elem_typeid;
              else if (declared_arg_types[j] == ANYARRAYOID)
              {
***************
*** 1323,1329 ****
      }

      /* if we return ANYELEMENTOID use the appropriate argument type */
!     if (rettype == ANYELEMENTOID)
          return elem_typeid;

      /* we don't return a generic type; send back the original return type */
--- 1345,1351 ----
      }

      /* if we return ANYELEMENTOID use the appropriate argument type */
!     if (rettype == ANYELEMENTOID || rettype == ANYENUMOID)
          return elem_typeid;

      /* we don't return a generic type; send back the original return type */
***************
*** 1362,1368 ****
                                  format_type_be(context_actual_type))));
              return context_actual_type;
          }
!         else if (context_declared_type == ANYELEMENTOID)
          {
              /* Use the array type corresponding to actual type */
              Oid            array_typeid = get_array_type(context_actual_type);
--- 1384,1391 ----
                                  format_type_be(context_actual_type))));
              return context_actual_type;
          }
!         else if (context_declared_type == ANYELEMENTOID ||
!                  context_declared_type == ANYENUMOID)
          {
              /* Use the array type corresponding to actual type */
              Oid            array_typeid = get_array_type(context_actual_type);
***************
*** 1375,1381 ****
              return array_typeid;
          }
      }
!     else if (declared_type == ANYELEMENTOID)
      {
          if (context_declared_type == ANYARRAYOID)
          {
--- 1398,1404 ----
              return array_typeid;
          }
      }
!     else if (declared_type == ANYELEMENTOID || declared_type == ANYENUMOID)
      {
          if (context_declared_type == ANYARRAYOID)
          {
***************
*** 1389,1395 ****
                                  format_type_be(context_actual_type))));
              return array_typelem;
          }
!         else if (context_declared_type == ANYELEMENTOID)
          {
              /* Use the actual type; it doesn't matter if array or not */
              return context_actual_type;
--- 1412,1419 ----
                                  format_type_be(context_actual_type))));
              return array_typelem;
          }
!         else if (context_declared_type == ANYELEMENTOID ||
!                  context_declared_type == ANYENUMOID)
          {
              /* Use the actual type; it doesn't matter if array or not */
              return context_actual_type;
***************
*** 1502,1507 ****
--- 1526,1532 ----
          case (INTERNALOID):
          case (OPAQUEOID):
          case (ANYELEMENTOID):
+         case (ANYENUMOID):
              result = GENERIC_TYPE;
              break;

***************
*** 1634,1639 ****
--- 1659,1668 ----
      if (srctype == targettype)
          return true;

+     /* Check for enums */
+     if (targettype == ANYENUMOID)
+         return is_type_enum(srctype);
+
      /* If srctype is a domain, reduce to its base type */
      if (OidIsValid(srctype))
          srctype = getBaseType(srctype);
***************
*** 1742,1747 ****
--- 1771,1790 ----

          ReleaseSysCache(tuple);
      }
+     /*
+      * Deal with enums. If the input type is an enum, and we haven't found
+      * an explicit pg_cast entry above, retry with ANYENUM.
+      */
+     else if (is_type_enum(sourceTypeId))
+     {
+         result = find_coercion_pathway(targetTypeId, ANYENUMOID,
+                                        ccontext, funcid, arrayCoerce);
+     }
+     else if (is_type_enum(targetTypeId))
+     {
+         result = find_coercion_pathway(ANYENUMOID, sourceTypeId,
+                                        ccontext, funcid, arrayCoerce);
+     }
      else
      {
          /*
***************
*** 1777,1782 ****
--- 1820,1826 ----
                  result = true;
              }
          }
+
      }

      return result;

Re: Current enums patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> If you want to review or test the feature, the attached patch can be
> used as a replacement for the portion that affects parse_coerce.c, and
> with this it compiles and passes regression. I think it's correct but it
> should still be OKed by at least one Tom. :-)

Since Neil seems to have dropped the ball on this, I'm going to pick it
up and review/apply it.  I need a break from thinking about HOT and
other bizarre schemes ;-)

Barring objection from Tom D, I'll start with this version.

            regards, tom lane

Re: Current enums patch

From
Tom Lane
Date:
>>> Here's the current version of the enums patch.

[ sounds of reviewing... ]  Is there a specific reason for
pg_enum.enumname to be type name and not type text?  ISTM that type name
wastes space (because most labels will probably be a lot shorter than 63
bytes) and at the same time imposes an implementation restriction that
we don't need to have.  It would make sense if the enum labels were
treated syntactically as SQL identifiers, but they're treated as
strings.  And there's no particular win to be had by having a
fixed-length struct, since there's no more fields anyway.

Unless someone objects, I'll change this and also revert to the
enumlabel name that seems to have been used originally (it was still
used in the docs).  It seems more readable somehow (I guess it's the
lack of either ascenders or descenders in "enumname").

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
>>>> Here's the current version of the enums patch.
>>>>
>
> [ sounds of reviewing... ]

(What are those? It's a bit hard to imagine you singing "doo di doo doo"
a la Homer while reviewing ....)

> Is there a specific reason for
> pg_enum.enumname to be type name and not type text?  ISTM that type name
> wastes space (because most labels will probably be a lot shorter than 63
> bytes) and at the same time imposes an implementation restriction that
> we don't need to have.  It would make sense if the enum labels were
> treated syntactically as SQL identifiers, but they're treated as
> strings.  And there's no particular win to be had by having a
> fixed-length struct, since there's no more fields anyway.
>

IIRC at one stage Tom wanted to try to make these identifiers, but that
was quickly abandoned. This might be a hangover from that. If someone
wants to use an insanely long enum label I guess that's their lookout.



cheers

andrew

Re: Current enums patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Is there a specific reason for
>> pg_enum.enumname to be type name and not type text?

> IIRC at one stage Tom wanted to try to make these identifiers, but that
> was quickly abandoned. This might be a hangover from that.

Actually I think I see the reason: it's a bit of a pain in the neck to
use the syscache mechanism with text-type lookup keys.  I'm not 100%
convinced that we really need to have syscaches on pg_enum, but if those
stay then it's probably not worth the trouble to avoid the limitation.

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> Tom Lane wrote:
>>
>>> Is there a specific reason for
>>> pg_enum.enumname to be type name and not type text?
>>>
>
>
>> IIRC at one stage Tom wanted to try to make these identifiers, but that
>> was quickly abandoned. This might be a hangover from that.
>>
>
> Actually I think I see the reason: it's a bit of a pain in the neck to
> use the syscache mechanism with text-type lookup keys.  I'm not 100%
> convinced that we really need to have syscaches on pg_enum, but if those
> stay then it's probably not worth the trouble to avoid the limitation.
>
>
>

That rings a faint bell.

If we don't have syscaches on pg_enum won't enum i/o get more expensive?

cheers

andrew


Re: Current enums patch

From
Tom Dunstan
Date:
Hi all! Thanks for reviewing the patch!

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Tom Lane wrote:
>>> Is there a specific reason for
>>> pg_enum.enumname to be type name and not type text?
>
>> IIRC at one stage Tom wanted to try to make these identifiers, but that
>> was quickly abandoned. This might be a hangover from that.
>
> Actually I think I see the reason: it's a bit of a pain in the neck to
> use the syscache mechanism with text-type lookup keys.  I'm not 100%
> convinced that we really need to have syscaches on pg_enum, but if those
> stay then it's probably not worth the trouble to avoid the limitation.

Yeah, that was the reason IIRC. The syscaches are used by the I/O
functions: The one keyed on the pg_enum OID is for output, and the one
keyed on the type OID and label, err, name, are for input. As suggested
by a certain party here [1]. There didn't seem to be any text-like key
types to use in the syscache except the name type, and I didn't see the
63 byte limit being a big deal, that's way bigger than any sane enum
name that I've ever seen.

If we ditched the second syscache, we'd want some other way to convert a
  type OID and name into the enum value oid efficiently. I originally
suggested having a cache that got hooked onto an fn_extra field; that
idea could be resurrected if you don't like the syscache.

Cheers

Tom


1[] http://archives.postgresql.org/pgsql-hackers/2006-08/msg01022.php

Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
 > Unless someone objects, I'll change this and also revert to the
 > enumlabel name that seems to have been used originally (it was still
 > used in the docs).  It seems more readable somehow (I guess it's the
 > lack of either ascenders or descenders in "enumname").

The name/text thing is discussed downthread. I actually started out
calling the field the name and changed it to the label, but perhaps I
only did that in the docs. It was probably while I was writing the docs
that I realized that name could refer to the enum type name or the value
name, which was confusing, but "value name" was kinda cumbersome, hence
"label". Change it over with my blessing.

Cheers

Tom

Re: Current enums patch

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> If we ditched the second syscache, we'd want some other way to convert a
>   type OID and name into the enum value oid efficiently.

True.  OK, never mind that then.  I'll still rename it as per your
other comment.

            regards, tom lane

Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> If you want to review or test the feature, the attached patch can be
>> used as a replacement for the portion that affects parse_coerce.c, and
>> with this it compiles and passes regression. I think it's correct but it
>> should still be OKed by at least one Tom. :-)

 > Barring objection from Tom D, I'll start with this version.

OK, I've now had a chance to look at Andrew's update of the patch, which
just seems to pass through the new arrayCoerce parameter to the
find_coercion_pathway calls. It almost doesn't matter what gets passed
in: find_coercion_pathway should never set that to true in our calls to
it in the enum code, as we're passing ANYENUMOID through to the recursed
call and that'll never hit the array coercion branch.

In summary: looks good to me!

Cheers

Tom

Re: Current enums patch

From
Alvaro Herrera
Date:
Tom Lane wrote:

> Unless someone objects, I'll change this and also revert to the
> enumlabel name that seems to have been used originally (it was still
> used in the docs).  It seems more readable somehow (I guess it's the
> lack of either ascenders or descenders in "enumname").

Wow, I wasn't aware we were picking our terms based on typography.  It
is kind of cool.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Current enums patch

From
Tom Lane
Date:
BTW, I've got fairly serious problems with a few of the "cuter" parts of
the patch:

enum_first, enum_last: these return ANYENUM but violate the rule that
a polymorphic-result function must have a polymorphic input argument
from which the parser may deduce the actual output type.

enum_range_all: same problem except ANYARRAY result.

text_enum: same problem as above, plus not acceptable to assume that it
gets the right enum OID, plus very definitely not acceptable to assume
that OID and typmod are the same size, plus I don't like the
undocumented kluge in getBaseTypeAndTypmod that is pretending to supply
the type OID for some small fraction of possible invocation cases.

I think text_enum is probably toast.  We might be able to salvage the
other three if we can think of some reasonable approach to type
determination.  One rather ugly possibility is that the argument is a
value of the target type --- which you can always have as null, so

    enum_first(null::rainbow) = 'red'

where enum_first is declared as taking and returning anyenum.  This
seems a bit yucky as opposed to the regtype approach in the patch,
and yet there are cases where it would be more handy --- eg, if
you are working with a table column "col" of some enum type, you
can do enum_first(col) instead of hardwiring the enum name.

There might be other better ways, though.  Thoughts?

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
> enum_first, enum_last: these return ANYENUM but violate the rule that
> a polymorphic-result function must have a polymorphic input argument
> from which the parser may deduce the actual output type.
>

Is this a tragedy when the supplied parameter gives the result type
directly?

If it really is, maybe we should return text instead of the enum
directly (or array of text in the case of enum_range).



cheers

andrew



Re: Current enums patch

From
Tom Dunstan
Date:
Andrew Dunstan wrote:
> Tom Lane wrote:
>> enum_first, enum_last: these return ANYENUM but violate the rule that
>> a polymorphic-result function must have a polymorphic input argument
>> from which the parser may deduce the actual output type.
>
> Is this a tragedy when the supplied parameter gives the result type
> directly?

I've been having a play with this. If you create a function taking an
enum type as a parameter, you can feed the output from enum_first into
it, regardless of the type given to enum_first. I doubt that it would be
a problem in practice, but it's certainly not great.

 > One rather ugly possibility is that the argument is a
 > value of the target type --- which you can always have as null, so
 >
 >     enum_first(null::rainbow) = 'red'
 >
 > where enum_first is declared as taking and returning anyenum.

I don't think that'll fly. If it's passed a null value, how does
enum_first know which enum type it's dealing with? Can you get the type
from the null value in some way?

 > This
 > seems a bit yucky as opposed to the regtype approach in the patch,
 > and yet there are cases where it would be more handy --- eg, if
 > you are working with a table column "col" of some enum type, you
 > can do enum_first(col) instead of hardwiring the enum name.

That's ok, as long as nulls work.

 > There might be other better ways, though.  Thoughts?

*Ponder*. In java, you can tie a generic return value to a particular
class by passing the class in as a bound generic parameter... could we
have a special rule that would look for e.g. a regtype as the first
parameter if the return type is generic and there are no generic parameters?


As a side note, the helper functions were intended for people writing
functions in plpgsql or whatever, allowing them to not hardcode the
values of the enum in their function. I consider them nice-to-have
rather than definitely required. If we can't come up with a nice way to
do them for 8.3, I'm not absolutely wedded to them. It would be *nice*,
though.

I really would like the cast from text, though, but I'll deal with that
in another email.

Regards

Tom

Re: Current enums patch

From
Tom Dunstan
Date:
> text_enum: same problem as above, plus not acceptable to assume that it
> gets the right enum OID, plus very definitely not acceptable to assume
> that OID and typmod are the same size, plus I don't like the
> undocumented kluge in getBaseTypeAndTypmod that is pretending to supply
> the type OID for some small fraction of possible invocation cases.
>
> I think text_enum is probably toast.

This was the ugliest part of the patch, and I mentioned it explicitly in
the original submission because I was uncomfortable about it. The proper
solution seemed to be to have another allowed cast function signature
that would take the regtype rather than the typmod. That got just a
little beyond what I wanted to do in the patch, and ugly as the
getBaseTypeAndTypmod hack was, it seemed less intrusive.

Another way to skirt the issue was to simply set the typmod of enum
types to have their own OID, but a) I wasn't sure what other things the
system might be inferring from the presence of a typmod, and b) you just
stated that we can't assume that typmod is big enough to hold an OID anyway.

I'm less concerned about the generic return type in this case, since the
parser should know the return type when an explicit cast has been
called. <Goes to check> Hmm, ok, maybe not. Looks like it's using the
return type of the cast function and throwing away the explicit cast
information. That's not very nice, but can probably be fixed in the parser.


Thoughts?

Tom

Re: Current enums patch

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> Andrew Dunstan wrote:
>> Tom Lane wrote:
>>> enum_first, enum_last: these return ANYENUM but violate the rule that
>>> a polymorphic-result function must have a polymorphic input argument
>>> from which the parser may deduce the actual output type.
>>
>> Is this a tragedy when the supplied parameter gives the result type
>> directly?

> I've been having a play with this. If you create a function taking an
> enum type as a parameter, you can feed the output from enum_first into
> it, regardless of the type given to enum_first. I doubt that it would be
> a problem in practice, but it's certainly not great.

Well, that's exactly the point: the proposed functions break the type
system by allowing values of one enum type to be fed to a function
expecting a different one.  Even though that's unlikely to cause a
system crash, it's still wrong.  We need to define these functions in a
way that allows type safety to be checked at parse time, the same as
every other type is required to be.

>>> One rather ugly possibility is that the argument is a
>>> value of the target type --- which you can always have as null, so
>>>
>>> enum_first(null::rainbow) = 'red'
>>>
>>> where enum_first is declared as taking and returning anyenum.

> I don't think that'll fly. If it's passed a null value, how does
> enum_first know which enum type it's dealing with? Can you get the type
> from the null value in some way?

The null Datum itself obviously doesn't carry that info, but the
expression tree does, and there are provisions for letting functions
retrieve that info --- see get_fn_expr_rettype and get_fn_expr_argtype.

It occurred to me this morning that get_fn_expr_rettype might
represent salvation for text_enum as well, though I've not tried
it yet.

> ... could we
> have a special rule that would look for e.g. a regtype as the first
> parameter if the return type is generic and there are no generic parameters?

I thought about that too but don't like it much.  The problem is mainly
that it can only work for a constant regtype parameter.

            regards, tom lane

Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
> The null Datum itself obviously doesn't carry that info, but the
> expression tree does, and there are provisions for letting functions
> retrieve that info --- see get_fn_expr_rettype and get_fn_expr_argtype.

Hmm. I vaguely remember that there was some feeling that the PLs
wouldn't always fill out the FmgrInfo struct, but perhaps that was just
the case with I/O functions.

>> ... could we
>> have a special rule that would look for e.g. a regtype as the first
>> parameter if the return type is generic and there are no generic parameters?
>
> I thought about that too but don't like it much.  The problem is mainly
> that it can only work for a constant regtype parameter.

OK, I give up. :) Why?

Thanks

Tom

Re: Current enums patch

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> Tom Lane wrote:
>>> ... could we
>>> have a special rule that would look for e.g. a regtype as the first
>>> parameter if the return type is generic and there are no generic parameters?
>>
>> I thought about that too but don't like it much.  The problem is mainly
>> that it can only work for a constant regtype parameter.

> OK, I give up. :) Why?

Because the whole point is that the type has to be known at parse time.

I've got it working with get_fn_expr_argtype and it seems fairly
reasonable.

            regards, tom lane

Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
>> OK, I give up. :) Why?
>
> Because the whole point is that the type has to be known at parse time.

Oh, duh. :)

> I've got it working with get_fn_expr_argtype and it seems fairly
> reasonable.

Cool!

Thanks

Tom

Re: Current enums patch

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> Here's the current version of the enums patch.

Applied with revisions --- some cosmetic, some not so much.
Attached is the patch-as-applied if you care to compare; feel free
to ask questions about anything not obvious.

            regards, tom lane


Attachment

Re: Current enums patch

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Tom Dunstan <pgsql@tomd.cc> writes:
>> Here's the current version of the enums patch.
>
> Applied with revisions --- some cosmetic, some not so much.
> Attached is the patch-as-applied if you care to compare; feel free
> to ask questions about anything not obvious.

There's a little bug:

postgres=#  CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE
postgres=#  CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT
INTO t VALUES

('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo');
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=#

In the server log:
TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 145)


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Current enums patch

From
Andrew Dunstan
Date:
Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Tom Dunstan <pgsql@tomd.cc> writes:
>>> Here's the current version of the enums patch.
>>
>> Applied with revisions --- some cosmetic, some not so much.
>> Attached is the patch-as-applied if you care to compare; feel free
>> to ask questions about anything not obvious.
>
> There's a little bug:
>
> postgres=#  CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE
> postgres=#  CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT
> INTO t VALUES
>
('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo');

>
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> postgres=#
>
> In the server log:
> TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 145)
>
>
Looks like we need to check the length on type creation too.

I'll fix later if not beaten to it.

cheers

andrew

Re: Current enums patch

From
"Tom Dunstan"
Date:
> Looks like we need to check the length on type creation
> too.
>
> I'll fix later if not beaten to it.

It works for me (ie fails with an appropriate error) locally
on Linux FC6 x86-64. Perhaps this platform initializes
memory to zero on allocation? I dunno. Anyway, if you can
reproduce it, please have a go, I'm at work and won't be
able to do anything more in depth for some time.

I could have sworn that I had some bounds checking in there
at one point, but it doesn't seem to be in the latest
version of the code. *shrug*. It should be both in
cstring_enum() in enum.c and in DefineEnum() in typecmds.c.

Cheers

Tom

Re: Current enums patch

From
Tom Lane
Date:
"Tom Dunstan" <pgsql@tomd.cc> writes:
>> Looks like we need to check the length on type creation
>> too.

> It works for me (ie fails with an appropriate error) locally
> on Linux FC6 x86-64. Perhaps this platform initializes
> memory to zero on allocation?

Sounds more like you're testing without asserts enabled; tut tut.

            regards, tom lane

Re: Current enums patch

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> There's a little bug:

> postgres=#  CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE
> postgres=#  CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT
> INTO t VALUES
>
('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo');
> server closed the connection unexpectedly

Hm, I suppose we should apply truncate_identifier rather than letting
the strings be blindly truncated (perhaps in mid-character).  Should we
have it throw the truncation NOTICE, or not?  First thought is to do so
during CREATE TYPE but not during plain enum_in().

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>
>> There's a little bug:
>>
>
>
>> postgres=#  CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE
>> postgres=#  CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT
>> INTO t VALUES
>>
('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo');
>> server closed the connection unexpectedly
>>
>
> Hm, I suppose we should apply truncate_identifier rather than letting
> the strings be blindly truncated (perhaps in mid-character).  Should we
> have it throw the truncation NOTICE, or not?  First thought is to do so
> during CREATE TYPE but not during plain enum_in().
>

I don't see much point in truncating.

The patch I have so far gives the regression output shown below (yes, I
should make the messages more consistent).

In fact the truncation and associated NOTICE just strike me as confusing.

cheers

andrew

+ -- Name, Values too long
+ --
+ CREATE TYPE
+
abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789
+  AS ENUM('a');
+ NOTICE:  identifier
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789"
will be truncated to
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxy"
+ ERROR:  type names must be 62 characters or less
+ CREATE TYPE toolong AS ENUM
+
('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+ ERROR:  invalid  enum label
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789":
must be 63 characters or less
+ INSERT INTO enumtest VALUES
+
('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+ ERROR:  input value too long (74) for enum:
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789"
+ --





Re: Current enums patch

From
Tom Dunstan
Date:
>> Hm, I suppose we should apply truncate_identifier rather than letting
>> the strings be blindly truncated (perhaps in mid-character).  Should we
>> have it throw the truncation NOTICE, or not?  First thought is to do so
>> during CREATE TYPE but not during plain enum_in().
>>
>
> I don't see much point in truncating.
>
> The patch I have so far gives the regression output shown below (yes, I
> should make the messages more consistent).
>
> In fact the truncation and associated NOTICE just strike me as confusing.

[snip]

 > + ERROR:  input value too long (74) for enum:
 > "abcdefghijklmnopqrsatuvwxyz012345 etc etc

I was about to suggest that we just truncate the value in the input
function and look it up on the basis that if it's too long, the lookup
will fail and we can just tell the user that it wasn't a valid value.
But if there's a valid value that has the same 63 bytes as the first 63
of the too-long input string, we'll end up looking up the valid one
wrongly. So we do need to test for length and die at that point. Can we
just fail with the same error message as trying to input a smaller, but
similarly invalid string?

At create time, we should definitely throw an error... if we just
truncate the value at byte 64 (with a notice or not) we might truncate
in the middle of a multi-byte character. Yuk.

Cheers

Tom

Re: Current enums patch

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> I was about to suggest that we just truncate the value in the input
> function and look it up on the basis that if it's too long, the lookup
> will fail and we can just tell the user that it wasn't a valid value.
> But if there's a valid value that has the same 63 bytes as the first 63
> of the too-long input string, we'll end up looking up the valid one
> wrongly. So we do need to test for length and die at that point. Can we
> just fail with the same error message as trying to input a smaller, but
> similarly invalid string?

> At create time, we should definitely throw an error... if we just
> truncate the value at byte 64 (with a notice or not) we might truncate
> in the middle of a multi-byte character. Yuk.

While all this reasoning is perfectly OK on its own terms, it ignores
the precedents of SQL identifier handling.  Maybe we should revisit the
question of whether the labels are identifiers.

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
>
> While all this reasoning is perfectly OK on its own terms, it ignores
> the precedents of SQL identifier handling.  Maybe we should revisit the
> question of whether the labels are identifiers.
>
>

If we do that can we still cache the values in the syscache? My
impression from what you and TomD said was that it would be at least
more difficult. If it's possible I'm all in favor.

(Side note: ISTM there is a pretty good case to move the truncation code
from the lexer to the parser where it can be invoked according to
context. Maybe something to think about next cycle.).

cheers

andrew

Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
> While all this reasoning is perfectly OK on its own terms, it ignores
> the precedents of SQL identifier handling.  Maybe we should revisit the
> question of whether the labels are identifiers.

Implying that they shouldn't be quoted like text (or should be
double-quoted if required)? Originally when discussing the patch I
thought that this was a good idea, but now I'm not so sure. How does one
set an enum value from e.g. a JDBC PreparedStatement in that scenario?

Cheers

Tom

Re: Current enums patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> While all this reasoning is perfectly OK on its own terms, it ignores
>> the precedents of SQL identifier handling.  Maybe we should revisit the
>> question of whether the labels are identifiers.

> If we do that can we still cache the values in the syscache?

Sure, as long as we're storing them as "name" it's not a problem.

But probably making them act like identifiers is not a good idea,
because I doubt we want automatic downcasing in enum_in.  People
wouldn't be happy if they had to write WHERE color = '"Red"' or
something like that to get at a mixed-case enum label.  Let's
just throw the error instead.  (I agree that enum_in can just fail
with "no such label", but CREATE TYPE ought to give a specific
error about it.)

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Dunstan wrote:
> Tom Lane wrote:
>> While all this reasoning is perfectly OK on its own terms, it ignores
>> the precedents of SQL identifier handling.  Maybe we should revisit the
>> question of whether the labels are identifiers.
>
> Implying that they shouldn't be quoted like text (or should be
> double-quoted if required)? Originally when discussing the patch I
> thought that this was a good idea, but now I'm not so sure. How does
> one set an enum value from e.g. a JDBC PreparedStatement in that
> scenario?
>
>
Heh ... I read the statement the other way, i.e. maybe we should treat
them entirely as strings with no length limitation. The trouble is they
are half identifiers and half not right now. We certainly can't treat
them fully as identifiers unless I'm right off course - the ambiguities
would be horrendous.

cheers

andrew


Re: Current enums patch

From
Tom Dunstan
Date:
Tom Lane wrote:
> But probably making them act like identifiers is not a good idea,
> because I doubt we want automatic downcasing in enum_in.  People
> wouldn't be happy if they had to write WHERE color = '"Red"' or
 > something like that to get at a mixed-case enum label.

Ah, that's what you had in mind. Yeah, that's a bit verbose.

 > Let's
> just throw the error instead.  (I agree that enum_in can just fail
> with "no such label", but CREATE TYPE ought to give a specific
> error about it.)

Agreed.

Andrew, you said you had a mostly-working patch already?

Cheers

Tom


Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Dunstan wrote:
>> Let's just throw the error instead.  (I agree that enum_in can just fail
>> with "no such label", but CREATE TYPE ought to give a specific
>> error about it.)
>
> Agreed.
>
> Andrew, you said you had a mostly-working patch already?
>
>

Working patch attached. If everyone's happy I'll apply it.

cheers

andrew

Index: src/backend/commands/typecmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/typecmds.c,v
retrieving revision 1.101
diff -c -r1.101 typecmds.c
*** src/backend/commands/typecmds.c    2 Apr 2007 03:49:38 -0000    1.101
--- src/backend/commands/typecmds.c    2 Apr 2007 19:57:40 -0000
***************
*** 949,954 ****
--- 949,955 ----
      Oid        enumNamespace;
      Oid        enumTypeOid;
      AclResult    aclresult;
+     ListCell    *lc;

      /* Convert list of names to a name and namespace */
      enumNamespace = QualifiedNameGetCreationNamespace(stmt->typename,
***************
*** 970,975 ****
--- 971,987 ----
                   errmsg("type names must be %d characters or less",
                          NAMEDATALEN - 2)));

+     foreach (lc, stmt->vals)
+     {
+         char *lab = strVal(lfirst(lc));
+         if (strlen(lab) > (NAMEDATALEN - 1))
+             ereport(ERROR,
+                     (errcode(ERRCODE_INVALID_NAME),
+                      errmsg("invalid enum label \"%s\", must be %d characters or less",
+                             lab,
+                             NAMEDATALEN - 1)));
+     }
+
      /* Create the pg_type entry */
      enumTypeOid =
          TypeCreate(enumName,        /* type name */
Index: src/backend/utils/adt/enum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/enum.c,v
retrieving revision 1.1
diff -c -r1.1 enum.c
*** src/backend/utils/adt/enum.c    2 Apr 2007 03:49:39 -0000    1.1
--- src/backend/utils/adt/enum.c    2 Apr 2007 19:57:41 -0000
***************
*** 44,49 ****
--- 44,57 ----
  {
      HeapTuple tup;
      Oid enumoid;
+     size_t namelen = strlen(name);
+
+     if (namelen >= NAMEDATALEN)
+         ereport(ERROR,
+                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                  errmsg("invalid input value for enum %s: \"%s\"",
+                         format_type_be(enumtypoid),
+                         name)));

      tup = SearchSysCache(ENUMTYPOIDNAME,
                           ObjectIdGetDatum(enumtypoid),
Index: src/test/regress/expected/enum.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/enum.out,v
retrieving revision 1.1
diff -c -r1.1 enum.out
*** src/test/regress/expected/enum.out    2 Apr 2007 03:49:42 -0000    1.1
--- src/test/regress/expected/enum.out    2 Apr 2007 19:57:43 -0000
***************
*** 40,45 ****
--- 40,59 ----
  (6 rows)

  --
+ -- Name, Values too long
+ --
+ CREATE TYPE
+  abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789
+  AS ENUM('a');
+ NOTICE:  identifier "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789" will be truncated to
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxy"
+ ERROR:  type names must be 62 characters or less
+ CREATE TYPE toolong AS ENUM
+ ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+ ERROR:  invalid enum label "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789", must be 63
charactersor less 
+ INSERT INTO enumtest VALUES
+ ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+ ERROR:  invalid input value for enum rainbow:
"abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789"
+ --
  -- Operators, no index
  --
  SELECT * FROM enumtest WHERE col = 'orange';
Index: src/test/regress/sql/enum.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/enum.sql,v
retrieving revision 1.1
diff -c -r1.1 enum.sql
*** src/test/regress/sql/enum.sql    2 Apr 2007 03:49:42 -0000    1.1
--- src/test/regress/sql/enum.sql    2 Apr 2007 19:57:44 -0000
***************
*** 27,32 ****
--- 27,45 ----
  SELECT * FROM enumtest;

  --
+ -- Name, Values too long
+ --
+ CREATE TYPE
+  abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789
+  AS ENUM('a');
+
+ CREATE TYPE toolong AS ENUM
+ ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+
+ INSERT INTO enumtest VALUES
+ ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789');
+
+ --
  -- Operators, no index
  --
  SELECT * FROM enumtest WHERE col = 'orange';

Re: Current enums patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Working patch attached. If everyone's happy I'll apply it.

Why not put the create-time length test into EnumValuesCreate's loop,
which has to grovel through the list already?

I'm wondering why bother with the temp variable in cstring_enum,
as opposed to just "if (strlen(name) >= NAMEDATALEN)"?
Also, a comment about why the test is necessary seems appropriate,
since otherwise someone might think it redundant:
    /* must check length to prevent Assert failure within SearchSysCache */

Lastly, a three-part regression test for this seems a bit silly.
Or a lot silly.  If we added test cases for every niggling little
bug we fix, the regression tests would be taking an hour to run
and would be less productive, not more, because people wouldn't
run them as often.

            regards, tom lane

Re: Current enums patch

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> Working patch attached. If everyone's happy I'll apply it.
>>
>
> Why not put the create-time length test into EnumValuesCreate's loop,
> which has to grovel through the list already?
>

My idea was to do the error check before calling TypeCreate. If that
doesn't matter I can move it - it just seemed a bit cleaner to do it
that way than to have to roll back a change to pg_type.

> I'm wondering why bother with the temp variable in cstring_enum,
> as opposed to just "if (strlen(name) >= NAMEDATALEN)"?
>

Originally I was going to use the length in the message. But I have
changed it now.

> Also, a comment about why the test is necessary seems appropriate,
> since otherwise someone might think it redundant:
>     /* must check length to prevent Assert failure within SearchSysCache */
>

OK
> Lastly, a three-part regression test for this seems a bit silly.
> Or a lot silly.  If we added test cases for every niggling little
> bug we fix, the regression tests would be taking an hour to run
> and would be less productive, not more, because people wouldn't
> run them as often.
>
>

OK.

cheers

andrew

Re: Current enums patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Why not put the create-time length test into EnumValuesCreate's loop,
>> which has to grovel through the list already?

> My idea was to do the error check before calling TypeCreate. If that
> doesn't matter I can move it - it just seemed a bit cleaner to do it
> that way than to have to roll back a change to pg_type.

Well, if this were a performance-critical case then yes, but it isn't.
I'd just as soon keep the code compact.  Besides, typecmds.c isn't
really directly aware of this issue: the reason it's EnumValuesCreate's
bailiwick is that the latter is what's truncating the strings.  If we
ever wanted to change the behavior, it'd be better to keep the
knowledge more localized.

            regards, tom lane