Thread: Current enums patch
Hi all Here's the current version of the enums patch. Not much change from last time, the only thought-inducing stuff was fixing up some macros that changed with the VARLENA changes, and adding a regression test to do basic checking of RI behavior, after the discussions that we had recently on the ri_trigger stuff with generic types. The actual behavior was fixed by Tom's earlier patch, so this is just a sanity check. Cheers Tom
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Tom Dunstan wrote: > Hi all > > Here's the current version of the enums patch. Not much change from last > time, the only thought-inducing stuff was fixing up some macros that > changed with the VARLENA changes, and adding a regression test to do > basic checking of RI behavior, after the discussions that we had > recently on the ri_trigger stuff with generic types. The actual behavior > was fixed by Tom's earlier patch, so this is just a sanity check. > > Cheers > > Tom [ application/x-gzip is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Am Dienstag, 27. März 2007 03:36 schrieb Tom Dunstan: > Here's the current version of the enums patch. Not much change from last > time, the only thought-inducing stuff was fixing up some macros that > changed with the VARLENA changes, and adding a regression test to do > basic checking of RI behavior, after the discussions that we had > recently on the ri_trigger stuff with generic types. The actual behavior > was fixed by Tom's earlier patch, so this is just a sanity check. Your patch doesn't compile anymore. ccache cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing-g -I. -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o parse_coerce.o parse_coerce.c-MMD -MP -MF .deps/parse_coerce.Po parse_coerce.c: In function 'can_coerce_type': parse_coerce.c:460: error: too few arguments to function 'find_coercion_pathway' parse_coerce.c: In function 'find_coercion_pathway': parse_coerce.c:1817: error: too few arguments to function 'find_coercion_pathway' parse_coerce.c:1822: error: too few arguments to function 'find_coercion_pathway' This was only changed a few days ago, so you need to update your patch. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Am Dienstag, 27. März 2007 03:36 schrieb Tom Dunstan: > >> Here's the current version of the enums patch. Not much change from last >> time, the only thought-inducing stuff was fixing up some macros that >> changed with the VARLENA changes, and adding a regression test to do >> basic checking of RI behavior, after the discussions that we had >> recently on the ri_trigger stuff with generic types. The actual behavior >> was fixed by Tom's earlier patch, so this is just a sanity check. >> > > Your patch doesn't compile anymore. > > ccache cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing-g -I. -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o parse_coerce.o parse_coerce.c-MMD -MP -MF .deps/parse_coerce.Po > parse_coerce.c: In function 'can_coerce_type': > parse_coerce.c:460: error: too few arguments to function 'find_coercion_pathway' > parse_coerce.c: In function 'find_coercion_pathway': > parse_coerce.c:1817: error: too few arguments to function 'find_coercion_pathway' > parse_coerce.c:1822: error: too few arguments to function 'find_coercion_pathway' > > This was only changed a few days ago, so you need to update your patch. > > Peter, If you want to review or test the feature, the attached patch can be used as a replacement for the portion that affects parse_coerce.c, and with this it compiles and passes regression. I think it's correct but it should still be OKed by at least one Tom. :-) cheers andrew Index: parse_coerce.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/parse_coerce.c,v retrieving revision 2.152 diff -c -r2.152 parse_coerce.c *** parse_coerce.c 27 Mar 2007 23:21:10 -0000 2.152 --- parse_coerce.c 31 Mar 2007 18:53:08 -0000 *************** *** 132,137 **** --- 132,138 ---- } if (targetTypeId == ANYOID || targetTypeId == ANYELEMENTOID || + targetTypeId == ANYENUMOID || (targetTypeId == ANYARRAYOID && inputTypeId != UNKNOWNOID)) { /* *************** *** 406,414 **** if (targetTypeId == ANYOID) continue; ! /* accept if target is ANYARRAY or ANYELEMENT, for now */ if (targetTypeId == ANYARRAYOID || ! targetTypeId == ANYELEMENTOID) { have_generics = true; /* do more checking later */ continue; --- 407,416 ---- if (targetTypeId == ANYOID) continue; ! /* accept if target is ANYARRAY, ANYELEMENT or ANYENUM, for now */ if (targetTypeId == ANYARRAYOID || ! targetTypeId == ANYELEMENTOID || ! targetTypeId == ANYENUMOID) { have_generics = true; /* do more checking later */ continue; *************** *** 451,456 **** --- 453,466 ---- continue; /* + * If input is an enum, can ANYENUM be cast to target? + */ + if (is_type_enum(inputTypeId) && + find_coercion_pathway(targetTypeId, ANYENUMOID, + ccontext, &funcId, &arrayCoerce)) + continue; + + /* * Else, cannot coerce at this argument position */ return false; *************** *** 1070,1076 **** Oid array_typeid = InvalidOid; Oid array_typelem; bool have_anyelement = false; ! /* * Loop through the arguments to see if we have any that are ANYARRAY or * ANYELEMENT. If so, require the actual types to be self-consistent --- 1080,1086 ---- Oid array_typeid = InvalidOid; Oid array_typelem; bool have_anyelement = false; ! bool have_enum = false; /* * Loop through the arguments to see if we have any that are ANYARRAY or * ANYELEMENT. If so, require the actual types to be self-consistent *************** *** 1079,1085 **** { Oid actual_type = actual_arg_types[j]; ! if (declared_arg_types[j] == ANYELEMENTOID) { have_anyelement = true; if (actual_type == UNKNOWNOID) --- 1089,1096 ---- { Oid actual_type = actual_arg_types[j]; ! if (declared_arg_types[j] == ANYELEMENTOID || ! declared_arg_types[j] == ANYENUMOID) { have_anyelement = true; if (actual_type == UNKNOWNOID) *************** *** 1087,1092 **** --- 1098,1105 ---- if (OidIsValid(elem_typeid) && actual_type != elem_typeid) return false; elem_typeid = actual_type; + if (declared_arg_types[j] == ANYENUMOID) + have_enum = true; } else if (declared_arg_types[j] == ANYARRAYOID) { *************** *** 1127,1132 **** --- 1140,1151 ---- } } + if (have_enum) + { + /* is the given type as enum type? */ + return is_type_enum(elem_typeid); + } + /* Looks valid */ return true; } *************** *** 1180,1186 **** Oid elem_typeid = InvalidOid; Oid array_typeid = InvalidOid; Oid array_typelem; ! bool have_anyelement = (rettype == ANYELEMENTOID); /* * Loop through the arguments to see if we have any that are ANYARRAY or --- 1199,1206 ---- Oid elem_typeid = InvalidOid; Oid array_typeid = InvalidOid; Oid array_typelem; ! bool have_anyelement = (rettype == ANYELEMENTOID || ! rettype == ANYENUMOID); /* * Loop through the arguments to see if we have any that are ANYARRAY or *************** *** 1190,1196 **** { Oid actual_type = actual_arg_types[j]; ! if (declared_arg_types[j] == ANYELEMENTOID) { have_generics = have_anyelement = true; if (actual_type == UNKNOWNOID) --- 1210,1217 ---- { Oid actual_type = actual_arg_types[j]; ! if (declared_arg_types[j] == ANYELEMENTOID || ! declared_arg_types[j] == ANYENUMOID) { have_generics = have_anyelement = true; if (actual_type == UNKNOWNOID) *************** *** 1289,1295 **** if (actual_type != UNKNOWNOID) continue; ! if (declared_arg_types[j] == ANYELEMENTOID) declared_arg_types[j] = elem_typeid; else if (declared_arg_types[j] == ANYARRAYOID) { --- 1310,1317 ---- if (actual_type != UNKNOWNOID) continue; ! if (declared_arg_types[j] == ANYELEMENTOID || ! declared_arg_types[j] == ANYENUMOID) declared_arg_types[j] = elem_typeid; else if (declared_arg_types[j] == ANYARRAYOID) { *************** *** 1323,1329 **** } /* if we return ANYELEMENTOID use the appropriate argument type */ ! if (rettype == ANYELEMENTOID) return elem_typeid; /* we don't return a generic type; send back the original return type */ --- 1345,1351 ---- } /* if we return ANYELEMENTOID use the appropriate argument type */ ! if (rettype == ANYELEMENTOID || rettype == ANYENUMOID) return elem_typeid; /* we don't return a generic type; send back the original return type */ *************** *** 1362,1368 **** format_type_be(context_actual_type)))); return context_actual_type; } ! else if (context_declared_type == ANYELEMENTOID) { /* Use the array type corresponding to actual type */ Oid array_typeid = get_array_type(context_actual_type); --- 1384,1391 ---- format_type_be(context_actual_type)))); return context_actual_type; } ! else if (context_declared_type == ANYELEMENTOID || ! context_declared_type == ANYENUMOID) { /* Use the array type corresponding to actual type */ Oid array_typeid = get_array_type(context_actual_type); *************** *** 1375,1381 **** return array_typeid; } } ! else if (declared_type == ANYELEMENTOID) { if (context_declared_type == ANYARRAYOID) { --- 1398,1404 ---- return array_typeid; } } ! else if (declared_type == ANYELEMENTOID || declared_type == ANYENUMOID) { if (context_declared_type == ANYARRAYOID) { *************** *** 1389,1395 **** format_type_be(context_actual_type)))); return array_typelem; } ! else if (context_declared_type == ANYELEMENTOID) { /* Use the actual type; it doesn't matter if array or not */ return context_actual_type; --- 1412,1419 ---- format_type_be(context_actual_type)))); return array_typelem; } ! else if (context_declared_type == ANYELEMENTOID || ! context_declared_type == ANYENUMOID) { /* Use the actual type; it doesn't matter if array or not */ return context_actual_type; *************** *** 1502,1507 **** --- 1526,1532 ---- case (INTERNALOID): case (OPAQUEOID): case (ANYELEMENTOID): + case (ANYENUMOID): result = GENERIC_TYPE; break; *************** *** 1634,1639 **** --- 1659,1668 ---- if (srctype == targettype) return true; + /* Check for enums */ + if (targettype == ANYENUMOID) + return is_type_enum(srctype); + /* If srctype is a domain, reduce to its base type */ if (OidIsValid(srctype)) srctype = getBaseType(srctype); *************** *** 1742,1747 **** --- 1771,1790 ---- ReleaseSysCache(tuple); } + /* + * Deal with enums. If the input type is an enum, and we haven't found + * an explicit pg_cast entry above, retry with ANYENUM. + */ + else if (is_type_enum(sourceTypeId)) + { + result = find_coercion_pathway(targetTypeId, ANYENUMOID, + ccontext, funcid, arrayCoerce); + } + else if (is_type_enum(targetTypeId)) + { + result = find_coercion_pathway(ANYENUMOID, sourceTypeId, + ccontext, funcid, arrayCoerce); + } else { /* *************** *** 1777,1782 **** --- 1820,1826 ---- result = true; } } + } return result;
Andrew Dunstan <andrew@dunslane.net> writes: > If you want to review or test the feature, the attached patch can be > used as a replacement for the portion that affects parse_coerce.c, and > with this it compiles and passes regression. I think it's correct but it > should still be OKed by at least one Tom. :-) Since Neil seems to have dropped the ball on this, I'm going to pick it up and review/apply it. I need a break from thinking about HOT and other bizarre schemes ;-) Barring objection from Tom D, I'll start with this version. regards, tom lane
>>> Here's the current version of the enums patch. [ sounds of reviewing... ] Is there a specific reason for pg_enum.enumname to be type name and not type text? ISTM that type name wastes space (because most labels will probably be a lot shorter than 63 bytes) and at the same time imposes an implementation restriction that we don't need to have. It would make sense if the enum labels were treated syntactically as SQL identifiers, but they're treated as strings. And there's no particular win to be had by having a fixed-length struct, since there's no more fields anyway. Unless someone objects, I'll change this and also revert to the enumlabel name that seems to have been used originally (it was still used in the docs). It seems more readable somehow (I guess it's the lack of either ascenders or descenders in "enumname"). regards, tom lane
Tom Lane wrote: >>>> Here's the current version of the enums patch. >>>> > > [ sounds of reviewing... ] (What are those? It's a bit hard to imagine you singing "doo di doo doo" a la Homer while reviewing ....) > Is there a specific reason for > pg_enum.enumname to be type name and not type text? ISTM that type name > wastes space (because most labels will probably be a lot shorter than 63 > bytes) and at the same time imposes an implementation restriction that > we don't need to have. It would make sense if the enum labels were > treated syntactically as SQL identifiers, but they're treated as > strings. And there's no particular win to be had by having a > fixed-length struct, since there's no more fields anyway. > IIRC at one stage Tom wanted to try to make these identifiers, but that was quickly abandoned. This might be a hangover from that. If someone wants to use an insanely long enum label I guess that's their lookout. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Is there a specific reason for >> pg_enum.enumname to be type name and not type text? > IIRC at one stage Tom wanted to try to make these identifiers, but that > was quickly abandoned. This might be a hangover from that. Actually I think I see the reason: it's a bit of a pain in the neck to use the syscache mechanism with text-type lookup keys. I'm not 100% convinced that we really need to have syscaches on pg_enum, but if those stay then it's probably not worth the trouble to avoid the limitation. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> Is there a specific reason for >>> pg_enum.enumname to be type name and not type text? >>> > > >> IIRC at one stage Tom wanted to try to make these identifiers, but that >> was quickly abandoned. This might be a hangover from that. >> > > Actually I think I see the reason: it's a bit of a pain in the neck to > use the syscache mechanism with text-type lookup keys. I'm not 100% > convinced that we really need to have syscaches on pg_enum, but if those > stay then it's probably not worth the trouble to avoid the limitation. > > > That rings a faint bell. If we don't have syscaches on pg_enum won't enum i/o get more expensive? cheers andrew
Hi all! Thanks for reviewing the patch! Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Tom Lane wrote: >>> Is there a specific reason for >>> pg_enum.enumname to be type name and not type text? > >> IIRC at one stage Tom wanted to try to make these identifiers, but that >> was quickly abandoned. This might be a hangover from that. > > Actually I think I see the reason: it's a bit of a pain in the neck to > use the syscache mechanism with text-type lookup keys. I'm not 100% > convinced that we really need to have syscaches on pg_enum, but if those > stay then it's probably not worth the trouble to avoid the limitation. Yeah, that was the reason IIRC. The syscaches are used by the I/O functions: The one keyed on the pg_enum OID is for output, and the one keyed on the type OID and label, err, name, are for input. As suggested by a certain party here [1]. There didn't seem to be any text-like key types to use in the syscache except the name type, and I didn't see the 63 byte limit being a big deal, that's way bigger than any sane enum name that I've ever seen. If we ditched the second syscache, we'd want some other way to convert a type OID and name into the enum value oid efficiently. I originally suggested having a cache that got hooked onto an fn_extra field; that idea could be resurrected if you don't like the syscache. Cheers Tom 1[] http://archives.postgresql.org/pgsql-hackers/2006-08/msg01022.php
Tom Lane wrote: > Unless someone objects, I'll change this and also revert to the > enumlabel name that seems to have been used originally (it was still > used in the docs). It seems more readable somehow (I guess it's the > lack of either ascenders or descenders in "enumname"). The name/text thing is discussed downthread. I actually started out calling the field the name and changed it to the label, but perhaps I only did that in the docs. It was probably while I was writing the docs that I realized that name could refer to the enum type name or the value name, which was confusing, but "value name" was kinda cumbersome, hence "label". Change it over with my blessing. Cheers Tom
Tom Dunstan <pgsql@tomd.cc> writes: > If we ditched the second syscache, we'd want some other way to convert a > type OID and name into the enum value oid efficiently. True. OK, never mind that then. I'll still rename it as per your other comment. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> If you want to review or test the feature, the attached patch can be >> used as a replacement for the portion that affects parse_coerce.c, and >> with this it compiles and passes regression. I think it's correct but it >> should still be OKed by at least one Tom. :-) > Barring objection from Tom D, I'll start with this version. OK, I've now had a chance to look at Andrew's update of the patch, which just seems to pass through the new arrayCoerce parameter to the find_coercion_pathway calls. It almost doesn't matter what gets passed in: find_coercion_pathway should never set that to true in our calls to it in the enum code, as we're passing ANYENUMOID through to the recursed call and that'll never hit the array coercion branch. In summary: looks good to me! Cheers Tom
Tom Lane wrote: > Unless someone objects, I'll change this and also revert to the > enumlabel name that seems to have been used originally (it was still > used in the docs). It seems more readable somehow (I guess it's the > lack of either ascenders or descenders in "enumname"). Wow, I wasn't aware we were picking our terms based on typography. It is kind of cool. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
BTW, I've got fairly serious problems with a few of the "cuter" parts of the patch: enum_first, enum_last: these return ANYENUM but violate the rule that a polymorphic-result function must have a polymorphic input argument from which the parser may deduce the actual output type. enum_range_all: same problem except ANYARRAY result. text_enum: same problem as above, plus not acceptable to assume that it gets the right enum OID, plus very definitely not acceptable to assume that OID and typmod are the same size, plus I don't like the undocumented kluge in getBaseTypeAndTypmod that is pretending to supply the type OID for some small fraction of possible invocation cases. I think text_enum is probably toast. We might be able to salvage the other three if we can think of some reasonable approach to type determination. One rather ugly possibility is that the argument is a value of the target type --- which you can always have as null, so enum_first(null::rainbow) = 'red' where enum_first is declared as taking and returning anyenum. This seems a bit yucky as opposed to the regtype approach in the patch, and yet there are cases where it would be more handy --- eg, if you are working with a table column "col" of some enum type, you can do enum_first(col) instead of hardwiring the enum name. There might be other better ways, though. Thoughts? regards, tom lane
Tom Lane wrote: > enum_first, enum_last: these return ANYENUM but violate the rule that > a polymorphic-result function must have a polymorphic input argument > from which the parser may deduce the actual output type. > Is this a tragedy when the supplied parameter gives the result type directly? If it really is, maybe we should return text instead of the enum directly (or array of text in the case of enum_range). cheers andrew
Andrew Dunstan wrote: > Tom Lane wrote: >> enum_first, enum_last: these return ANYENUM but violate the rule that >> a polymorphic-result function must have a polymorphic input argument >> from which the parser may deduce the actual output type. > > Is this a tragedy when the supplied parameter gives the result type > directly? I've been having a play with this. If you create a function taking an enum type as a parameter, you can feed the output from enum_first into it, regardless of the type given to enum_first. I doubt that it would be a problem in practice, but it's certainly not great. > One rather ugly possibility is that the argument is a > value of the target type --- which you can always have as null, so > > enum_first(null::rainbow) = 'red' > > where enum_first is declared as taking and returning anyenum. I don't think that'll fly. If it's passed a null value, how does enum_first know which enum type it's dealing with? Can you get the type from the null value in some way? > This > seems a bit yucky as opposed to the regtype approach in the patch, > and yet there are cases where it would be more handy --- eg, if > you are working with a table column "col" of some enum type, you > can do enum_first(col) instead of hardwiring the enum name. That's ok, as long as nulls work. > There might be other better ways, though. Thoughts? *Ponder*. In java, you can tie a generic return value to a particular class by passing the class in as a bound generic parameter... could we have a special rule that would look for e.g. a regtype as the first parameter if the return type is generic and there are no generic parameters? As a side note, the helper functions were intended for people writing functions in plpgsql or whatever, allowing them to not hardcode the values of the enum in their function. I consider them nice-to-have rather than definitely required. If we can't come up with a nice way to do them for 8.3, I'm not absolutely wedded to them. It would be *nice*, though. I really would like the cast from text, though, but I'll deal with that in another email. Regards Tom
> text_enum: same problem as above, plus not acceptable to assume that it > gets the right enum OID, plus very definitely not acceptable to assume > that OID and typmod are the same size, plus I don't like the > undocumented kluge in getBaseTypeAndTypmod that is pretending to supply > the type OID for some small fraction of possible invocation cases. > > I think text_enum is probably toast. This was the ugliest part of the patch, and I mentioned it explicitly in the original submission because I was uncomfortable about it. The proper solution seemed to be to have another allowed cast function signature that would take the regtype rather than the typmod. That got just a little beyond what I wanted to do in the patch, and ugly as the getBaseTypeAndTypmod hack was, it seemed less intrusive. Another way to skirt the issue was to simply set the typmod of enum types to have their own OID, but a) I wasn't sure what other things the system might be inferring from the presence of a typmod, and b) you just stated that we can't assume that typmod is big enough to hold an OID anyway. I'm less concerned about the generic return type in this case, since the parser should know the return type when an explicit cast has been called. <Goes to check> Hmm, ok, maybe not. Looks like it's using the return type of the cast function and throwing away the explicit cast information. That's not very nice, but can probably be fixed in the parser. Thoughts? Tom
Tom Dunstan <pgsql@tomd.cc> writes: > Andrew Dunstan wrote: >> Tom Lane wrote: >>> enum_first, enum_last: these return ANYENUM but violate the rule that >>> a polymorphic-result function must have a polymorphic input argument >>> from which the parser may deduce the actual output type. >> >> Is this a tragedy when the supplied parameter gives the result type >> directly? > I've been having a play with this. If you create a function taking an > enum type as a parameter, you can feed the output from enum_first into > it, regardless of the type given to enum_first. I doubt that it would be > a problem in practice, but it's certainly not great. Well, that's exactly the point: the proposed functions break the type system by allowing values of one enum type to be fed to a function expecting a different one. Even though that's unlikely to cause a system crash, it's still wrong. We need to define these functions in a way that allows type safety to be checked at parse time, the same as every other type is required to be. >>> One rather ugly possibility is that the argument is a >>> value of the target type --- which you can always have as null, so >>> >>> enum_first(null::rainbow) = 'red' >>> >>> where enum_first is declared as taking and returning anyenum. > I don't think that'll fly. If it's passed a null value, how does > enum_first know which enum type it's dealing with? Can you get the type > from the null value in some way? The null Datum itself obviously doesn't carry that info, but the expression tree does, and there are provisions for letting functions retrieve that info --- see get_fn_expr_rettype and get_fn_expr_argtype. It occurred to me this morning that get_fn_expr_rettype might represent salvation for text_enum as well, though I've not tried it yet. > ... could we > have a special rule that would look for e.g. a regtype as the first > parameter if the return type is generic and there are no generic parameters? I thought about that too but don't like it much. The problem is mainly that it can only work for a constant regtype parameter. regards, tom lane
Tom Lane wrote: > The null Datum itself obviously doesn't carry that info, but the > expression tree does, and there are provisions for letting functions > retrieve that info --- see get_fn_expr_rettype and get_fn_expr_argtype. Hmm. I vaguely remember that there was some feeling that the PLs wouldn't always fill out the FmgrInfo struct, but perhaps that was just the case with I/O functions. >> ... could we >> have a special rule that would look for e.g. a regtype as the first >> parameter if the return type is generic and there are no generic parameters? > > I thought about that too but don't like it much. The problem is mainly > that it can only work for a constant regtype parameter. OK, I give up. :) Why? Thanks Tom
Tom Dunstan <pgsql@tomd.cc> writes: > Tom Lane wrote: >>> ... could we >>> have a special rule that would look for e.g. a regtype as the first >>> parameter if the return type is generic and there are no generic parameters? >> >> I thought about that too but don't like it much. The problem is mainly >> that it can only work for a constant regtype parameter. > OK, I give up. :) Why? Because the whole point is that the type has to be known at parse time. I've got it working with get_fn_expr_argtype and it seems fairly reasonable. regards, tom lane
Tom Lane wrote: >> OK, I give up. :) Why? > > Because the whole point is that the type has to be known at parse time. Oh, duh. :) > I've got it working with get_fn_expr_argtype and it seems fairly > reasonable. Cool! Thanks Tom
Tom Dunstan <pgsql@tomd.cc> writes: > Here's the current version of the enums patch. Applied with revisions --- some cosmetic, some not so much. Attached is the patch-as-applied if you care to compare; feel free to ask questions about anything not obvious. regards, tom lane
Attachment
Tom Lane wrote: > Tom Dunstan <pgsql@tomd.cc> writes: >> Here's the current version of the enums patch. > > Applied with revisions --- some cosmetic, some not so much. > Attached is the patch-as-applied if you care to compare; feel free > to ask questions about anything not obvious. There's a little bug: postgres=# CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE postgres=# CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT INTO t VALUES ('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. postgres=# In the server log: TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 145) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Tom Lane wrote: >> Tom Dunstan <pgsql@tomd.cc> writes: >>> Here's the current version of the enums patch. >> >> Applied with revisions --- some cosmetic, some not so much. >> Attached is the patch-as-applied if you care to compare; feel free >> to ask questions about anything not obvious. > > There's a little bug: > > postgres=# CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE > postgres=# CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT > INTO t VALUES > ('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'); > > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. > postgres=# > > In the server log: > TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 145) > > Looks like we need to check the length on type creation too. I'll fix later if not beaten to it. cheers andrew
> Looks like we need to check the length on type creation > too. > > I'll fix later if not beaten to it. It works for me (ie fails with an appropriate error) locally on Linux FC6 x86-64. Perhaps this platform initializes memory to zero on allocation? I dunno. Anyway, if you can reproduce it, please have a go, I'm at work and won't be able to do anything more in depth for some time. I could have sworn that I had some bounds checking in there at one point, but it doesn't seem to be in the latest version of the code. *shrug*. It should be both in cstring_enum() in enum.c and in DefineEnum() in typecmds.c. Cheers Tom
"Tom Dunstan" <pgsql@tomd.cc> writes: >> Looks like we need to check the length on type creation >> too. > It works for me (ie fails with an appropriate error) locally > on Linux FC6 x86-64. Perhaps this platform initializes > memory to zero on allocation? Sounds more like you're testing without asserts enabled; tut tut. regards, tom lane
Heikki Linnakangas <heikki@enterprisedb.com> writes: > There's a little bug: > postgres=# CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE > postgres=# CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT > INTO t VALUES > ('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'); > server closed the connection unexpectedly Hm, I suppose we should apply truncate_identifier rather than letting the strings be blindly truncated (perhaps in mid-character). Should we have it throw the truncation NOTICE, or not? First thought is to do so during CREATE TYPE but not during plain enum_in(). regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > >> There's a little bug: >> > > >> postgres=# CREATE TYPE aenum AS ENUM ('a','b','c'); CREATE TYPE >> postgres=# CREATE TABLE t (e aenum); CREATE TABLE postgres=# INSERT >> INTO t VALUES >> ('foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'); >> server closed the connection unexpectedly >> > > Hm, I suppose we should apply truncate_identifier rather than letting > the strings be blindly truncated (perhaps in mid-character). Should we > have it throw the truncation NOTICE, or not? First thought is to do so > during CREATE TYPE but not during plain enum_in(). > I don't see much point in truncating. The patch I have so far gives the regression output shown below (yes, I should make the messages more consistent). In fact the truncation and associated NOTICE just strike me as confusing. cheers andrew + -- Name, Values too long + -- + CREATE TYPE + abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789 + AS ENUM('a'); + NOTICE: identifier "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789" will be truncated to "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxy" + ERROR: type names must be 62 characters or less + CREATE TYPE toolong AS ENUM + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + ERROR: invalid enum label "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789": must be 63 characters or less + INSERT INTO enumtest VALUES + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + ERROR: input value too long (74) for enum: "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789" + --
>> Hm, I suppose we should apply truncate_identifier rather than letting >> the strings be blindly truncated (perhaps in mid-character). Should we >> have it throw the truncation NOTICE, or not? First thought is to do so >> during CREATE TYPE but not during plain enum_in(). >> > > I don't see much point in truncating. > > The patch I have so far gives the regression output shown below (yes, I > should make the messages more consistent). > > In fact the truncation and associated NOTICE just strike me as confusing. [snip] > + ERROR: input value too long (74) for enum: > "abcdefghijklmnopqrsatuvwxyz012345 etc etc I was about to suggest that we just truncate the value in the input function and look it up on the basis that if it's too long, the lookup will fail and we can just tell the user that it wasn't a valid value. But if there's a valid value that has the same 63 bytes as the first 63 of the too-long input string, we'll end up looking up the valid one wrongly. So we do need to test for length and die at that point. Can we just fail with the same error message as trying to input a smaller, but similarly invalid string? At create time, we should definitely throw an error... if we just truncate the value at byte 64 (with a notice or not) we might truncate in the middle of a multi-byte character. Yuk. Cheers Tom
Tom Dunstan <pgsql@tomd.cc> writes: > I was about to suggest that we just truncate the value in the input > function and look it up on the basis that if it's too long, the lookup > will fail and we can just tell the user that it wasn't a valid value. > But if there's a valid value that has the same 63 bytes as the first 63 > of the too-long input string, we'll end up looking up the valid one > wrongly. So we do need to test for length and die at that point. Can we > just fail with the same error message as trying to input a smaller, but > similarly invalid string? > At create time, we should definitely throw an error... if we just > truncate the value at byte 64 (with a notice or not) we might truncate > in the middle of a multi-byte character. Yuk. While all this reasoning is perfectly OK on its own terms, it ignores the precedents of SQL identifier handling. Maybe we should revisit the question of whether the labels are identifiers. regards, tom lane
Tom Lane wrote: > > While all this reasoning is perfectly OK on its own terms, it ignores > the precedents of SQL identifier handling. Maybe we should revisit the > question of whether the labels are identifiers. > > If we do that can we still cache the values in the syscache? My impression from what you and TomD said was that it would be at least more difficult. If it's possible I'm all in favor. (Side note: ISTM there is a pretty good case to move the truncation code from the lexer to the parser where it can be invoked according to context. Maybe something to think about next cycle.). cheers andrew
Tom Lane wrote: > While all this reasoning is perfectly OK on its own terms, it ignores > the precedents of SQL identifier handling. Maybe we should revisit the > question of whether the labels are identifiers. Implying that they shouldn't be quoted like text (or should be double-quoted if required)? Originally when discussing the patch I thought that this was a good idea, but now I'm not so sure. How does one set an enum value from e.g. a JDBC PreparedStatement in that scenario? Cheers Tom
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> While all this reasoning is perfectly OK on its own terms, it ignores >> the precedents of SQL identifier handling. Maybe we should revisit the >> question of whether the labels are identifiers. > If we do that can we still cache the values in the syscache? Sure, as long as we're storing them as "name" it's not a problem. But probably making them act like identifiers is not a good idea, because I doubt we want automatic downcasing in enum_in. People wouldn't be happy if they had to write WHERE color = '"Red"' or something like that to get at a mixed-case enum label. Let's just throw the error instead. (I agree that enum_in can just fail with "no such label", but CREATE TYPE ought to give a specific error about it.) regards, tom lane
Tom Dunstan wrote: > Tom Lane wrote: >> While all this reasoning is perfectly OK on its own terms, it ignores >> the precedents of SQL identifier handling. Maybe we should revisit the >> question of whether the labels are identifiers. > > Implying that they shouldn't be quoted like text (or should be > double-quoted if required)? Originally when discussing the patch I > thought that this was a good idea, but now I'm not so sure. How does > one set an enum value from e.g. a JDBC PreparedStatement in that > scenario? > > Heh ... I read the statement the other way, i.e. maybe we should treat them entirely as strings with no length limitation. The trouble is they are half identifiers and half not right now. We certainly can't treat them fully as identifiers unless I'm right off course - the ambiguities would be horrendous. cheers andrew
Tom Lane wrote: > But probably making them act like identifiers is not a good idea, > because I doubt we want automatic downcasing in enum_in. People > wouldn't be happy if they had to write WHERE color = '"Red"' or > something like that to get at a mixed-case enum label. Ah, that's what you had in mind. Yeah, that's a bit verbose. > Let's > just throw the error instead. (I agree that enum_in can just fail > with "no such label", but CREATE TYPE ought to give a specific > error about it.) Agreed. Andrew, you said you had a mostly-working patch already? Cheers Tom
Tom Dunstan wrote: >> Let's just throw the error instead. (I agree that enum_in can just fail >> with "no such label", but CREATE TYPE ought to give a specific >> error about it.) > > Agreed. > > Andrew, you said you had a mostly-working patch already? > > Working patch attached. If everyone's happy I'll apply it. cheers andrew Index: src/backend/commands/typecmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/typecmds.c,v retrieving revision 1.101 diff -c -r1.101 typecmds.c *** src/backend/commands/typecmds.c 2 Apr 2007 03:49:38 -0000 1.101 --- src/backend/commands/typecmds.c 2 Apr 2007 19:57:40 -0000 *************** *** 949,954 **** --- 949,955 ---- Oid enumNamespace; Oid enumTypeOid; AclResult aclresult; + ListCell *lc; /* Convert list of names to a name and namespace */ enumNamespace = QualifiedNameGetCreationNamespace(stmt->typename, *************** *** 970,975 **** --- 971,987 ---- errmsg("type names must be %d characters or less", NAMEDATALEN - 2))); + foreach (lc, stmt->vals) + { + char *lab = strVal(lfirst(lc)); + if (strlen(lab) > (NAMEDATALEN - 1)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_NAME), + errmsg("invalid enum label \"%s\", must be %d characters or less", + lab, + NAMEDATALEN - 1))); + } + /* Create the pg_type entry */ enumTypeOid = TypeCreate(enumName, /* type name */ Index: src/backend/utils/adt/enum.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/enum.c,v retrieving revision 1.1 diff -c -r1.1 enum.c *** src/backend/utils/adt/enum.c 2 Apr 2007 03:49:39 -0000 1.1 --- src/backend/utils/adt/enum.c 2 Apr 2007 19:57:41 -0000 *************** *** 44,49 **** --- 44,57 ---- { HeapTuple tup; Oid enumoid; + size_t namelen = strlen(name); + + if (namelen >= NAMEDATALEN) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input value for enum %s: \"%s\"", + format_type_be(enumtypoid), + name))); tup = SearchSysCache(ENUMTYPOIDNAME, ObjectIdGetDatum(enumtypoid), Index: src/test/regress/expected/enum.out =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/expected/enum.out,v retrieving revision 1.1 diff -c -r1.1 enum.out *** src/test/regress/expected/enum.out 2 Apr 2007 03:49:42 -0000 1.1 --- src/test/regress/expected/enum.out 2 Apr 2007 19:57:43 -0000 *************** *** 40,45 **** --- 40,59 ---- (6 rows) -- + -- Name, Values too long + -- + CREATE TYPE + abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789 + AS ENUM('a'); + NOTICE: identifier "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789" will be truncated to "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxy" + ERROR: type names must be 62 characters or less + CREATE TYPE toolong AS ENUM + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + ERROR: invalid enum label "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789", must be 63 charactersor less + INSERT INTO enumtest VALUES + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + ERROR: invalid input value for enum rainbow: "abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789" + -- -- Operators, no index -- SELECT * FROM enumtest WHERE col = 'orange'; Index: src/test/regress/sql/enum.sql =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/sql/enum.sql,v retrieving revision 1.1 diff -c -r1.1 enum.sql *** src/test/regress/sql/enum.sql 2 Apr 2007 03:49:42 -0000 1.1 --- src/test/regress/sql/enum.sql 2 Apr 2007 19:57:44 -0000 *************** *** 27,32 **** --- 27,45 ---- SELECT * FROM enumtest; -- + -- Name, Values too long + -- + CREATE TYPE + abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789 + AS ENUM('a'); + + CREATE TYPE toolong AS ENUM + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + + INSERT INTO enumtest VALUES + ('abcdefghijklmnopqrsatuvwxyz0123456789abcdefghijklmnopqrsatuvwxyz0123456789'); + + -- -- Operators, no index -- SELECT * FROM enumtest WHERE col = 'orange';
Andrew Dunstan <andrew@dunslane.net> writes: > Working patch attached. If everyone's happy I'll apply it. Why not put the create-time length test into EnumValuesCreate's loop, which has to grovel through the list already? I'm wondering why bother with the temp variable in cstring_enum, as opposed to just "if (strlen(name) >= NAMEDATALEN)"? Also, a comment about why the test is necessary seems appropriate, since otherwise someone might think it redundant: /* must check length to prevent Assert failure within SearchSysCache */ Lastly, a three-part regression test for this seems a bit silly. Or a lot silly. If we added test cases for every niggling little bug we fix, the regression tests would be taking an hour to run and would be less productive, not more, because people wouldn't run them as often. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Working patch attached. If everyone's happy I'll apply it. >> > > Why not put the create-time length test into EnumValuesCreate's loop, > which has to grovel through the list already? > My idea was to do the error check before calling TypeCreate. If that doesn't matter I can move it - it just seemed a bit cleaner to do it that way than to have to roll back a change to pg_type. > I'm wondering why bother with the temp variable in cstring_enum, > as opposed to just "if (strlen(name) >= NAMEDATALEN)"? > Originally I was going to use the length in the message. But I have changed it now. > Also, a comment about why the test is necessary seems appropriate, > since otherwise someone might think it redundant: > /* must check length to prevent Assert failure within SearchSysCache */ > OK > Lastly, a three-part regression test for this seems a bit silly. > Or a lot silly. If we added test cases for every niggling little > bug we fix, the regression tests would be taking an hour to run > and would be less productive, not more, because people wouldn't > run them as often. > > OK. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Why not put the create-time length test into EnumValuesCreate's loop, >> which has to grovel through the list already? > My idea was to do the error check before calling TypeCreate. If that > doesn't matter I can move it - it just seemed a bit cleaner to do it > that way than to have to roll back a change to pg_type. Well, if this were a performance-critical case then yes, but it isn't. I'd just as soon keep the code compact. Besides, typecmds.c isn't really directly aware of this issue: the reason it's EnumValuesCreate's bailiwick is that the latter is what's truncating the strings. If we ever wanted to change the behavior, it'd be better to keep the knowledge more localized. regards, tom lane