Thread: BUG #5028: CASE returns ELSE value always when type is "char"
The following bug has been logged online: Bug reference: 5028 Logged by: Joseph Shraibman Email address: jks@selectacast.net PostgreSQL version: 8.3.7 Operating system: Linux Description: CASE returns ELSE value always when type is "char" Details: [local]:playpen=> SELECT version(); version ---------------------------------------------------------------------------- ------------------------------- PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) (1 row) Time: 1.658 ms [local]:playpen=> [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE c.relkind playpen-> END from (select 'r'::"char" AS relkind) c; relkind | relkind ---------+--------- r | t (1 row) Time: 1.407 ms [local]:playpen=> [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' playpen-> END from (select 'r'::"char" AS relkind) c; relkind | case ---------+------- r | table (1 row) Time: 0.426 ms [local]:playpen=> [local]:playpen=> [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE c.relkind playpen-> END from (select 'r'::char AS relkind) c; relkind | relkind ---------+--------- r | table (1 row)
On Tue, Sep 01, 2009 at 04:36:25PM +0000, Joseph Shraibman wrote: > Description: CASE returns ELSE value always when type is "char" I think it's just silently truncating the literal to a single character. > [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' > WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' > THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE > c.relkind > playpen-> END from (select 'r'::"char" AS relkind) c; > relkind | relkind > ---------+--------- > r | t Here, 'r' maps to the "char" literal 'table' which PG interprets as the value 't'--i.e. PG silently chops of the 'able'. The bug would seem to be in your code, but PG could maybe throw an error to tell you this is what is happening? A possible fix would be to have your ELSE clause as: c.relkind::text As that way the other branches would be interpreted as text and they wouldn't be getting chopped off along the way. -- Sam http://samason.me.uk/
It appears the problem is with the "char" being in the ELSE expression, not being in the CASE expression [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE c.relkind::char playpen-> END from (select 'r'::"char" AS relkind) c; relkind | relkind ---------+--------- r | table (1 row) Time: 0.674 ms [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE c.relkind::"char" playpen-> END from (select 'r'::char AS relkind) c; relkind | relkind ---------+--------- r | t (1 row) Joseph Shraibman wrote: > The following bug has been logged online: > > Bug reference: 5028 > Logged by: Joseph Shraibman > Email address: jks@selectacast.net > PostgreSQL version: 8.3.7 > Operating system: Linux > Description: CASE returns ELSE value always when type is "char" > Details: > > [local]:playpen=> SELECT version(); > version > > ---------------------------------------------------------------------------- > ------------------------------- > PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) > 4.1.2 20080704 (Red Hat 4.1.2-44) > (1 row) > > Time: 1.658 ms > [local]:playpen=> > [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' > WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' > THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE > c.relkind > playpen-> END from (select 'r'::"char" AS relkind) c; > relkind | relkind > ---------+--------- > r | t > (1 row) > > Time: 1.407 ms > [local]:playpen=> > [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' > WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' > THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' > playpen-> END from (select 'r'::"char" AS relkind) c; > relkind | case > ---------+------- > r | table > (1 row) > > Time: 0.426 ms > [local]:playpen=> > [local]:playpen=> > [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' > WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' > THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE > c.relkind > playpen-> END from (select 'r'::char AS relkind) c; > relkind | relkind > ---------+--------- > r | table > (1 row) >
Sam Mason wrote: > On Tue, Sep 01, 2009 at 04:36:25PM +0000, Joseph Shraibman wrote: >> Description: CASE returns ELSE value always when type is "char" > > I think it's just silently truncating the literal to a single character. > >> [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table' >> WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' >> THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE >> c.relkind >> playpen-> END from (select 'r'::"char" AS relkind) c; >> relkind | relkind >> ---------+--------- >> r | t Ah, OK it seems I didn't pay close enough attention. > > Here, 'r' maps to the "char" literal 'table' which PG interprets as the > value 't'--i.e. PG silently chops of the 'able'. So the type of what is in the ELSE clause determines the type of the output?
Joseph Shraibman <jks@selectacast.net> writes: > So the type of what is in the ELSE clause determines the type of the output? If all the other branches are unknown literals, yes. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joseph Shraibman <jks@selectacast.net> writes: >> So the type of what is in the ELSE clause determines the type of >> the output? > > If all the other branches are unknown literals, yes. What's the best place to look to get a handle on what the benefits are of treating character string literals as being of unknown type? (Other than the obvious backward compatibility issues.) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Joseph Shraibman <jks@selectacast.net> writes: >>> So the type of what is in the ELSE clause determines the type of >>> the output? >> >> If all the other branches are unknown literals, yes. > What's the best place to look to get a handle on what the benefits are > of treating character string literals as being of unknown type? > (Other than the obvious backward compatibility issues.) I think the odds of changing that are not measurably different from zero. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Joseph Shraibman <jks@selectacast.net> writes: >>>> So the type of what is in the ELSE clause determines the type of >>>> the output? >>> >>> If all the other branches are unknown literals, yes. > >> What's the best place to look to get a handle on what the benefits >> are of treating character string literals as being of unknown type? >> (Other than the obvious backward compatibility issues.) > > I think the odds of changing that are not measurably different from > zero. I figured that; I'm just trying to understand what seems to me like an odd wart on the type system. I figure I must be missing something important, so I'd kinda like to find out what that is. -Kevin
On Tue, 2009-09-01 at 13:49 -0500, Kevin Grittner wrote: > I figured that; I'm just trying to understand what seems to me like an > odd wart on the type system. I figure I must be missing something > important, so I'd kinda like to find out what that is. If I understand your question, you're comparing: (a) leaving a literal as "unknown" until you've finished inferring types (current behavior) (b) casting every unknown to text immediately, and then trying to infer the types In general, option (b) eliminates information that might be useful for making good inferences about the correct operators to use, and also finding cases of ambiguity. For instance, consider the current behavior: 1. select now()::text < 'January 01, 2009'; -- true 2. select now() < 'January 01, 2009'; -- false 3. select now() < 'January 01, 2009'::text; ERROR: operator does not exist: timestamp with time zone < text Example #2 shows that we can infer the the RHS is of type timestamptz based on the type of the LHS. That's desirable behavior in any type-inferencing system -- without it you might as well just explicitly cast all literals. Example #3 is ambiguous: we have no way to know whether to choose "< (timestamptz, timestamptz)" or "< (text, text)", and an ERROR is desirable behavior to avoid confusing results. But you can't have both of those desirable behaviors unless you are somehow aware that "'January 01, 2009'" is something more malleable than "now()" in example #2. Calling the RHS "unknown" in example #2 gives us that information. Regards, Jeff Davis
On Tue, Sep 01, 2009 at 05:49:25PM +0100, Sam Mason wrote: > PG could maybe throw an error to tell you this is > what is happening? Would something like the included patch be accepted? -- Sam http://samason.me.uk/
Attachment
Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2009-09-01 at 13:49 -0500, Kevin Grittner wrote: >> I figured that; I'm just trying to understand what seems to me like >> an odd wart on the type system. I figure I must be missing >> something important, so I'd kinda like to find out what that is. > > If I understand your question, you're comparing: > > (a) leaving a literal as "unknown" until you've finished > inferring types (current behavior) > (b) casting every unknown to text immediately, and then trying to > infer the types No, that's not it. I'm wondering why it isn't treated as text. Period. Full stop. Nothing to infer. Anywhere that we have implicit casts defined from text to something else could, of course, still operate; but it would be text. No guessing. > In general, option (b) eliminates information that might be useful > for making good inferences about the correct operators to use, and > also finding cases of ambiguity. It often seems to have the opposite effect. See the original post. > For instance, consider the current behavior: > > 1. select now()::text < 'January 01, 2009'; -- true > 2. select now() < 'January 01, 2009'; -- false > 3. select now() < 'January 01, 2009'::text; > ERROR: operator does not exist: timestamp with time zone < > text In my view, it is wrong that any of those work. I would expect to have to code one of these: select now() < date '2009-01-01'; -- implicit casts should cover select now() < timestamp with time zone '2009-01-01 00:00:00.0'; I understand that there is probably a huge base of existing code which counts on being able to be sloppy with types and have PostgreSQL automagically infer types other than what is actually specified; but I'd rather not expose such sloppy behavior to those running ad hoc queries at my site. > Example #2 shows that we can infer the the RHS is of type > timestamptz based on the type of the LHS. That's desirable behavior > in any type-inferencing system -- without it you might as well just > explicitly cast all literals. Example #3 is ambiguous: we have no > way to know whether to choose "< (timestamptz, timestamptz)" or "< > (text, text)", and an ERROR is desirable behavior to avoid confusing > results. Here I think you have answered my question. It is seen as a feature, since it allows people to avoid the extra keystrokes of coding type-specific literal values, and allows them the entertainment of seeing how the values get interpreted. :-) > But you can't have both of those desirable behaviors Whether they are desirable is the point of disagreement. At least I now understand the reasoning. Thanks, -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Jeff Davis <pgsql@j-davis.com> wrote: >> If I understand your question, you're comparing: >> >> (a) leaving a literal as "unknown" until you've finished >> inferring types (current behavior) >> (b) casting every unknown to text immediately, and then trying to >> infer the types > No, that's not it. I'm wondering why it isn't treated as text. > Period. Full stop. Nothing to infer. Because then we would have to provide implicit casts from text to everything else, which would be horribly dangerous. > In my view, it is wrong that any of those work. I would expect to > have to code one of these: > select now() < date '2009-01-01'; -- implicit casts should cover > select now() < timestamp with time zone '2009-01-01 00:00:00.0'; [ shrug... ] The current design is a compromise between usability and strictness of semantics. This proposal appears to be all strictness and no usability. No thanks ... even without any backward-compatibility considerations, I wouldn't find this to be an improvement. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> No, that's not it. I'm wondering why it isn't treated as text. >> Period. Full stop. Nothing to infer. > > Because then we would have to provide implicit casts from text to > everything else, which would be horribly dangerous. I would like that even less. I like errors on type conflicts. >> In my view, it is wrong that any of those work. I would expect to >> have to code one of these: > >> select now() < date '2009-01-01'; -- implicit casts should cover >> select now() < timestamp with time zone '2009-01-01 00:00:00.0'; > > The current design is a compromise between usability and strictness > of semantics. This proposal appears to be all strictness and no > usability. I was not proposing anything; I was trying to understand the reasons for the current behavior so that I could think about what might make sense to address some of the places where current behavior causes a result which is different from a non-error result should be obtained under the standard. I couldn't begin to anticipate what might be acceptable in these situations without understanding the reason things are as they are. I do understand that there will be "convenience" extensions to the standard -- all products do that. I wasn't sure whether that was the reason for the behavior or whether there was something else in play. Thanks for clarifying, -Kevin
Sam Mason <sam@samason.me.uk> writes: > On Tue, Sep 01, 2009 at 05:49:25PM +0100, Sam Mason wrote: >> PG could maybe throw an error to tell you this is >> what is happening? > Would something like the included patch be accepted? ISTM this was debated once before and rejected. However, there's a proposal over here http://archives.postgresql.org/pgsql-hackers/2009-08/msg02073.php to tighten up what the "char" input functions will take, and maybe throwing error for excess input would fit in with that. regards, tom lane
On Wed, 2009-09-02 at 08:57 -0500, Kevin Grittner wrote: > > (a) leaving a literal as "unknown" until you've finished > > inferring types (current behavior) > > (b) casting every unknown to text immediately, and then trying to > > infer the types > > No, that's not it. I'm wondering why it isn't treated as text. > Period. Full stop. Nothing to infer. Anywhere that we have implicit > casts defined from text to something else could, of course, still > operate; but it would be text. No guessing. If you have very many implicit casts, I think you lose the predictability and safety you're looking for, and/or end up with a lot of errors that eliminate the convenience of implicit casting. > It often seems to have the opposite effect. See the original post. The original problem has more to do with the fact that interpreting an unknown value as a char seems to just discard a lot of information. I assume that's part of the standard, but it seems like a bad idea any time you silently discard data (which is why we prevented varchar(n) from silently truncating a while ago). > Here I think you have answered my question. It is seen as a feature, > since it allows people to avoid the extra keystrokes of coding > type-specific literal values, and allows them the entertainment of > seeing how the values get interpreted. :-) > > > But you can't have both of those desirable behaviors > > Whether they are desirable is the point of disagreement. At least I > now understand the reasoning. They are desirable for a system that infers types from context. I agree that there's more safety by explicitly declaring the type of all literals; but I disagree that using implicit casts to make up for a lack of an "unknown" type will improve matters (either for convenience or safety). Regards, Jeff Davis
On Wed, Sep 2, 2009 at 3:44 PM, Jeff Davis<pgsql@j-davis.com> wrote: > On Wed, 2009-09-02 at 08:57 -0500, Kevin Grittner wrote: >> > =A0 (a) leaving a literal as "unknown" until you've finished >> > =A0 =A0 =A0 inferring types (current behavior) >> > =A0 (b) casting every unknown to text immediately, and then trying to >> > =A0 =A0 =A0 infer the types >> >> No, that's not it. =A0I'm wondering why it isn't treated as text. >> Period. =A0Full stop. =A0Nothing to infer. =A0Anywhere that we have impl= icit >> casts defined from text to something else could, of course, still >> operate; but it would be text. =A0No guessing. > > If you have very many implicit casts, I think you lose the > predictability and safety you're looking for, and/or end up with a lot > of errors that eliminate the convenience of implicit casting. Perhaps we should stop thinking of "unknown" as, er, "unknown" and think of it as "text literal". A text literal has implicit casts to every data type but a normal text string has to be explicitly cast. Hm, that's not quite right because things like array(1)||'5' don't treat the '5' as a text literal. The "implicit cast" is preferred to treating it as text. --=20 greg http://mit.edu/~gsstark/resume.pdf
Jeff Davis <pgsql@j-davis.com> wrote: > I disagree that using implicit casts to make up for a lack of an > "unknown" type will improve matters I certainly never meant to imply that additional implicit casts should be added. I apologize for not being more clear about that. Thanks again for helping fill in the blanks. -Kevin
On Wed, Sep 02, 2009 at 03:50:05PM +0100, Greg Stark wrote: > Perhaps we should stop thinking of "unknown" as, er, "unknown" and > think of it as "text literal". A text literal has implicit casts to > every data type but a normal text string has to be explicitly cast. How does that help things? You seem to be keeping the semantics and only changing the name, when it's the semantics that you seem to be complaining about. I'm pretty sure it's correct to treat it as "unknown", other type-systems do this and it all works well. The thing that makes things ambiguous is the ad-hoc polymorphism that exists in function calls and operators. With PG's type system you know almost nothing about any types involved in an arbitrary expression, operators are better than functions (given the types of the arguments you know the return type, with default parameters even this knowledge doesn't exist with functions) but still leave things far too open to have any rigor without spelling out types in full everywhere. For example: CREATE FUNCTION add(int,int) RETURNS int LANGUAGE sql AS $$ SELECT $1 + $2; $$; CREATE FUNCTION add(int,int,int DEFAULT NULL) RETURNS text LANGUAGE sql AS $$ SELECT ($1 + $2)::text; $$; What type should it attribute to the result of: SELECT add(1,2); In fact it doesn't seem to want to play ball at all. Even given the apparently unambiguous: SELECT 1+add(1,2); or SELECT 'hi'||add(1,2); It doesn't get anywhere. No need for "text 'hi'" in the second one because || isn't defined for values of integer type. -- Sam http://samason.me.uk/
On Wed, Sep 2, 2009 at 11:55 AM, Sam Mason<sam@samason.me.uk> wrote: > In fact it doesn't seem to want to play ball at all. =A0Even given the > apparently unambiguous: > > =A0SELECT 1+add(1,2); > or > =A0SELECT 'hi'||add(1,2); > > It doesn't get anywhere. =A0No need for "text 'hi'" in the second one > because || isn't defined for values of integer type. Right. This is exactly the sort of thing that languages with real type inference have no problem handling. Of course, ML for example doesn't allow overloading precisely because (AIUI) it makes type inference difficult. It would be awesome if we could make this work though. ...Robert
On Wed, Sep 02, 2009 at 12:36:00PM -0400, Robert Haas wrote: > On Wed, Sep 2, 2009 at 11:55 AM, Sam Mason<sam@samason.me.uk> wrote: > > In fact it doesn't seem to want to play ball at all. Even given the > > apparently unambiguous: > > > > SELECT 1+add(1,2); > > or > > SELECT 'hi'||add(1,2); > > > > It doesn't get anywhere. No need for "text 'hi'" in the second one > > because || isn't defined for values of integer type. > > Right. This is exactly the sort of thing that languages with real > type inference have no problem handling. Of course, ML for example > doesn't allow overloading precisely because (AIUI) it makes type > inference difficult. It would be awesome if we could make this work > though. Difficult, but not intractable. Haskell has done this sort of thing for quite a while; although it handles ad-hoc polymorphism differently than PG does. You basically end up saying how much polymorphism you want to allow for each function, for example the return type of an operator (in PG) is determined exactly by the type of its arguments. In Haskell you would have the power to say, if you so wanted, that the type of an operator's RHS is determined exactly by the type of its LHS and return type, or even, in the most general case, that it's parametrized over all three types. Obviously the more types you leave free the more typing you have to do specifying all the types as the type-inference has less leverage to work with. I've been trying to think about how to apply a more modern type system to PG for a while and hence my comments about how things like NULL rows should be handled are based on this and may come across as rather dogmatic sometimes, it's about the only way I can get things to hold together without introducing much more complexity than seems strictly necessary. -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes: > ... For example: > CREATE FUNCTION add(int,int) RETURNS int LANGUAGE sql > AS $$ SELECT $1 + $2; $$; > CREATE FUNCTION add(int,int,int DEFAULT NULL) RETURNS text LANGUAGE sql > AS $$ SELECT ($1 + $2)::text; $$; > What type should it attribute to the result of: > SELECT add(1,2); > In fact it doesn't seem to want to play ball at all. Even given the > apparently unambiguous: > SELECT 1+add(1,2); > or > SELECT 'hi'||add(1,2); > It doesn't get anywhere. Well, no, because our type resolution is bottom-up; it does not consider context when trying to resolve the overloaded "add()" function reference. "Unknown" is the only part of the system that allows for any delay at all in identifying the type of a construct, and even that is limited to a literal and its first-level surrounding context. It's interesting that you want to go in 100% the opposite direction from Kevin, who seems to want to eliminate type inference altogether. Maybe our current compromise isn't too bad, if it makes everybody unhappy in opposite directions ;-) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's interesting that you want to go in 100% the opposite direction > from Kevin, who seems to want to eliminate type inference > altogether. Maybe our current compromise isn't too bad, if it makes > everybody unhappy in opposite directions ;-) Well, it's probably worth noting that, while I would prefer strong typing all around, I recognize that unless that's easier than I think to make as a configuration option, and strict is not the default, it's not going to happen. Too many people prefer things the other way. What I'm most concerned about are the corner cases where strict typing would give one non-error result and the inferred typing results in an error or a different result from the strict typing. I'm willing to argue that those are bugs, at least when the strongly typed behavior is mandated by the SQL standard. I pray that I never again have to deal with a database as generous with typing as Pick or Advanced Revelation. Seriously, working with those gave me that feeling you normally only get during a nightmare. :-( When a "date of birth" column can contain a list of phone numbers -- well, you want just want do something that makes you forget you saw that.... -Kevin
On Wed, Sep 02, 2009 at 01:19:07PM -0400, Tom Lane wrote: > Sam Mason <sam@samason.me.uk> writes: > > ... For example: > > > CREATE FUNCTION add(int,int) RETURNS int LANGUAGE sql > > AS $$ SELECT $1 + $2; $$; > > > CREATE FUNCTION add(int,int,int DEFAULT NULL) RETURNS text LANGUAGE sql > > AS $$ SELECT ($1 + $2)::text; $$; > > > What type should it attribute to the result of: > > > SELECT add(1,2); > > > In fact it doesn't seem to want to play ball at all. Even given the > > apparently unambiguous: > > > SELECT 1+add(1,2); > > or > > SELECT 'hi'||add(1,2); > > > It doesn't get anywhere. > > Well, no, because our type resolution is bottom-up; it does not consider > context when trying to resolve the overloaded "add()" function > reference. "Unknown" is the only part of the system that allows for any > delay at all in identifying the type of a construct, and even that is > limited to a literal and its first-level surrounding context. OK, I got distracted and my example was bad. > It's interesting that you want to go in 100% the opposite direction from > Kevin, who seems to want to eliminate type inference altogether. Maybe > our current compromise isn't too bad, if it makes everybody unhappy in > opposite directions ;-) The current compromise has worked for a while so there's no immediate reason to change it. I wasn't interpreting Kevin's request directly as I don't think he really means what he's saying and really wants an unambiguous way of writing literals in queries. I think it's easier to get this without specifying types everywhere and introducing constraints elsewhere to maintain what compatibility we've got with the SQL standard. If we did follow Kevin's request directly, should we also be specifying the type of NULL? -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> wrote: > If we did follow Kevin's request directly, should we also be > specifying the type of NULL? I don't *think* the SQL standard requires that, and barring that I don't see any compelling reason to type NULL. One problem I do see with the current scheme, however, is that NULL *does* get typed to text when it makes no sense. In my view, a CASE expression which has only NULL for its return values, or an abbreviated form of CASE, such as COALESCE or NULLIF, should be evaluated exactly the same as if they were replaced by NULL itself. For example, COALESCE(NULL, NULL) currently yields NULL::text. In my view that's wrong. I view it as a bug, but that seems to be a hard sell here. Likewise, I think that in the query which started this thread, the cast to "char" is not sensible. I'm not sure how that could be resolved, but it doesn't seem correct to me. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > What I'm most concerned about are the corner cases where strict typing > would give one non-error result and the inferred typing results in an > error or a different result from the strict typing. I'm willing to > argue that those are bugs, at least when the strongly typed behavior > is mandated by the SQL standard. Are there any such cases? Your interpretation of strict typing seems to be that everything is type-labeled to start with, which means that type inference doesn't actually have anything to do. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> What I'm most concerned about are the corner cases where strict >> typing would give one non-error result and the inferred typing >> results in an error or a different result from the strict typing. >> I'm willing to argue that those are bugs, at least when the >> strongly typed behavior is mandated by the SQL standard. > > Are there any such cases? Your interpretation of strict typing > seems to be that everything is type-labeled to start with, which > means that type inference doesn't actually have anything to do. A simple, self-contained example derived from the OP: test=# create table t (c "char"); CREATE TABLE test=# insert into t values ('a'); INSERT 0 1 test=# select case when c = 'a' then 'Hey' else c end from t; c --- H (1 row) test=# select case when c = 'a' then 'Hey'::text else c end from t; c ----- Hey (1 row) -Kevin
On Wed, Sep 02, 2009 at 12:54:03PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > If we did follow Kevin's request directly, should we also be > > specifying the type of NULL? > > I don't *think* the SQL standard requires that, and barring that I > don't see any compelling reason to type NULL. The SQL standard certainly doesn't require it. It's just that you were requiring the types of literals that happened to be enclosed in quotes to have their type ascribed, so why not the NULL literal? > One problem I do see > with the current scheme, however, is that NULL *does* get typed to > text when it makes no sense. In my view, a CASE expression which has > only NULL for its return values, or an abbreviated form of CASE, such > as COALESCE or NULLIF, should be evaluated exactly the same as if they > were replaced by NULL itself. For example, COALESCE(NULL, NULL) > currently yields NULL::text. In my view that's wrong. I view it as a > bug, but that seems to be a hard sell here. Yes, that's because PG does a bottom-up solve of the type constraints. I think it should really result in an unknown type as well. > Likewise, I think that in the query which started this thread, the > cast to "char" is not sensible. I'm not sure how that could be > resolved, but it doesn't seem correct to me. All branches unify with the "char" type (i.e. they're all "char" or unknown) and hence the result of the expression is determined to be of type "char". -- Sam http://samason.me.uk/
I wrote: > A simple, self-contained example derived from the OP: > > test=# create table t (c "char"); > CREATE TABLE > test=# insert into t values ('a'); > INSERT 0 1 > test=# select case when c = 'a' then 'Hey' else c end from t; > c > --- > H > (1 row) > > test=# select case when c = 'a' then 'Hey'::text else c end from t; > c > ----- > Hey > (1 row) And I'm not even sure how I'd explain the rules to someone. (I guess that's because I don't understand them, really, but the other way sounds better....) test=# drop table t; DROP TABLE test=# create table t (c varchar(2)); CREATE TABLE test=# insert into t values ('a'); INSERT 0 1 test=# select case when c = 'a' then 'Hey' else c end from t; c ----- Hey (1 row) test=# select case when c = 'a' then 'Hey'::text else c end from t; c ----- Hey (1 row) -Kevin
Sam Mason <sam@samason.me.uk> wrote: > you were requiring the types of literals that happened to be > enclosed in quotes to have their type ascribed, so why not the NULL > literal? Well, unless things have changed in recent versions of the standard and I've missed the change, a series of characters enclosed in apostrophes is what the standard calls a "character string literal" and defines it to be be related to character based types such as varchar. As far as I'm aware, considering it to be undefined is a PostgreSQL extension. If you can point to something in the standard to show where I'm mistaken, I'll look it over. I'll go looking for something to back my memories on the topic, too, since my memory seems to be less reliable than it once was..... -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Well, unless things have changed in recent versions of the standard > and I've missed the change, a series of characters enclosed in > apostrophes is what the standard calls a "character string literal" > and defines it to be be related to character based types such as > varchar. That still seems to be the case in the draft of the 2003 standard I have: <general literal> ::= <character string literal> | <national character string literal> | <Unicode character string literal> | <binary string literal> | <datetime literal> | <interval literal> | <boolean literal> <character string literal> ::= [ <introducer><character set specification> ] <quote> [ <character representation>... ] <quote> [ { <separator> <quote> [ <character representation>... ] <quote> }... ] The ball's in your court to show something in the standard to say that a character string literal is ever *not* to be taken as a character string. -Kevin
On Wed, Sep 02, 2009 at 01:27:35PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > you were requiring the types of literals that happened to be > > enclosed in quotes to have their type ascribed, so why not the NULL > > literal? > > Well, unless things have changed in recent versions of the standard > and I've missed the change, a series of characters enclosed in > apostrophes is what the standard calls a "character string literal" > and defines it to be be related to character based types such as > varchar. As far as I'm aware, considering it to be undefined is a > PostgreSQL extension. If you can point to something in the standard > to show where I'm mistaken, I'll look it over. I'll go looking for > something to back my memories on the topic, too, since my memory seems > to be less reliable than it once was..... Sorry, I was referring to your explicit naming of types as in the following: http://archives.postgresql.org/message-id/4A9E3378020000250002A87E@gw.wicourts.gov reading it back again I'm not sure if that's what you meant now. Email is hard work isn't it! -- Sam http://samason.me.uk/
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > And I'm not even sure how I'd explain the rules to someone. text is preferred to "char" which is preferred to unknown. This particular example would be less confusing if 'Hey'::"char" threw an error, but that behavior is the result of an ancient (bad?) decision in the input function of one legacy datatype. It's not, IMNSHO, evidence of an overall failure of the type system as a whole. regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Sam Mason <sam@samason.me.uk> wrote: >> If we did follow Kevin's request directly, should we also be >> specifying the type of NULL? > I don't *think* the SQL standard requires that, and barring that I > don't see any compelling reason to type NULL. Actually, AFAICS the SQL spec *does* require you to cast NULL literals explicitly in nearly all contexts. There are exceptions for assigning NULL directly to a field (in INSERT or UPDATE), and maybe some other places, but not many. The PG type system treats a bare NULL literal as "unknown", so most of the same issues come up for either NULL or literal constants. If you were to try to get rid of the "unknown" concept, I think the only place you'd find yourself violating the standard is the aforesaid narrow exceptions. Usability is a different question though ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> And I'm not even sure how I'd explain the rules to someone. > > text is preferred to "char" which is preferred to unknown. > > This particular example would be less confusing if 'Hey'::"char" > threw an error, but that behavior is the result of an ancient > (bad?) decision in the input function of one legacy datatype. > It's not, IMNSHO, evidence of an overall failure of the type system > as a whole. So the behavior of the "char" type is anomalous in this regard? Other character-based types behave like varchar (which has the behavior I would expect here)? That is encouraging. Why isn't the behavior of "char" in this regard considered a bug to be fixed? I'm not sure I'm exactly understanding why the varchar(2) worked, though. Perhaps it would be more clear if I grasped why *that* one does what I would think is the right thing. At this point my first guess would be that it discards the length for a varchar, and just treats it as text (or some other "neutral" character-based type). If so, perhaps "char" should do the same? -Kevin
On Wed, Sep 02, 2009 at 01:55:28PM -0500, Kevin Grittner wrote: > So the behavior of the "char" type is anomalous in this regard? Other > character-based types behave like varchar (which has the behavior I > would expect here)? That is encouraging. Why isn't the behavior of > "char" in this regard considered a bug to be fixed? I think there are lots of implicit casts going on that muddy the water with respect to what's going on at a type level. Things get promoted to TEXT easily in PG. I posted a patch in this thread: http://archives.postgresql.org/message-id/20090902091654.GL5407@samason.me.uk -- Sam http://samason.me.uk/
On Wed, Sep 02, 2009 at 01:37:20PM -0500, Kevin Grittner wrote: > That still seems to be the case in the draft of the 2003 standard I > have: > > <general literal> ::= > <character string literal> > | <national character string literal> > | <Unicode character string literal> > | <binary string literal> > | <datetime literal> > | <interval literal> > | <boolean literal> > <character string literal> ::= > [ <introducer><character set specification> ] > <quote> [ <character representation>... ] <quote> > [ { <separator> <quote> [ <character representation>... ] <quote> > }... ] > > The ball's in your court to show something in the standard to say that > a character string literal is ever *not* to be taken as a character > string. Huh, you're right. I'd always thought '2001-01-01' was a valid date literal, seems the standard has required it to be prefixed by DATE at least back to SQL92. -- Sam http://samason.me.uk/
On Wed, Sep 02, 2009 at 12:54:03PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > If we did follow Kevin's request directly, should we also be > > specifying the type of NULL? > > I don't *think* the SQL standard requires that, and barring that I > don't see any compelling reason to type NULL. I've just realized that either I'm missing your point entirely (it's happened before :) or this ignores the point entirely. PG wants to assign types to every expression, whether this expression will evaluate to a NULL value at run-time or not is immaterial in this regard. I think SQL wants to do the same, but I don't have as much conviction as Tom here. Once we're ascribing types to expressions then whether it happens to contain the literal "1", "'txt'" or "NULL" we're committed to giving it some type---the only question is which one. We thus need to type expressions consisting of just NULL constants. A fun puzzle to base any inductive solution on is what type to ascribe to the following: CREATE VIEW v (c) AS SELECT NULL; PG allows it, but the resulting view seems somewhat unusable. -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> wrote: > I'd always thought '2001-01-01' was a valid date literal, seems the > standard has required it to be prefixed by DATE at least back to > SQL92. Yep. I don't know if it would be remotely feasible, but the implementation which seems like it would be "standard-safe" but still give reasonable concessions to those wanting to skip the extra keystrokes of declaring the type of literals which are not character based would be to go with the suggestion of having a character string literal type, and change the semantics such that if there is a valid interpretation of the statement with the character string literal taken as text, it should be used; if not, resolve by current "unknown" rules. Probably not feasible, but it seems likely it would make everyone reasonably happy if it could be done. That leaves the issue of NULL being forced to type text in the absence of any type info in CASE, COALESCE, and NULLIF. If there were a way to say that these could return unknown type, that would be solved. That doesn't seem as though it would be likely to be massively difficult, although I could be wrong about that. -Kevin
Sam Mason <sam@samason.me.uk> wrote: > CREATE VIEW v (c) AS > SELECT NULL; > > PG allows it, but the resulting view seems somewhat unusable. I'm not sure whether the only place the standard doesn't require a cast is on assignment, but this is one place that the standard clearly does require a cast, and I'm all for that. Requiring a cast anywhere else the standard requires it would not offend me; although not requiring it anywhere it doesn't generate nonstandard results, and where the semantics are relatively sane, wouldn't offend me, either. -Kevin
On Wed, Sep 02, 2009 at 02:41:32PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > > I'd always thought '2001-01-01' was a valid date literal, seems the > > standard has required it to be prefixed by DATE at least back to > > SQL92. > > Yep. I don't know if it would be remotely feasible, but the > implementation which seems like it would be "standard-safe" but still > give reasonable concessions to those wanting to skip the extra > keystrokes of declaring the type of literals which are not character > based would be to go with the suggestion of having a character string > literal type, and change the semantics such that if there is a valid > interpretation of the statement with the character string literal > taken as text, it should be used; if not, resolve by current "unknown" > rules. Probably not feasible, but it seems likely it would make > everyone reasonably happy if it could be done. Sounds as though that'll introduce more ambiguity into the system than there is already. > That leaves the issue of NULL being forced to type text in the absence > of any type info in CASE, COALESCE, and NULLIF. If there were a way > to say that these could return unknown type, that would be solved. > That doesn't seem as though it would be likely to be massively > difficult, although I could be wrong about that. Would be nice and I'd love it to work like this, but it's quite a big change I think. Currently, once PG has decided on a type it sticks with it against all further evidence. Another example: SELECT NULL AS c UNION SELECT '1' UNION SELECT 2; Once you're doing the above you're into the world of full parametric polymorphism and you're having to do much more complicated things at the type level. When my free time becomes infinite I'll have a chance! -- Sam http://samason.me.uk/
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Yep. I don't know if it would be remotely feasible, but the > implementation which seems like it would be "standard-safe" but still > give reasonable concessions to those wanting to skip the extra > keystrokes of declaring the type of literals which are not character > based would be to go with the suggestion of having a character string > literal type, and change the semantics such that if there is a valid > interpretation of the statement with the character string literal > taken as text, it should be used; if not, resolve by current "unknown" > rules. There is already a weak preference for resolving unknown as text in the presence of multiple alternatives. So I'm not sure that you're suggesting anything different from what happens now. In particular, weren't you the same person complaining a moment ago about COALESCE(NULL,NULL) defaulting to text? Why is that bad if the above is good? regards, tom lane
On Wed, Sep 02, 2009 at 02:59:54PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > CREATE VIEW v (c) AS > > SELECT NULL; > > > > PG allows it, but the resulting view seems somewhat unusable. > > I'm not sure whether the only place the standard doesn't require a > cast is on assignment, but this is one place that the standard clearly > does require a cast, and I'm all for that. I'm probably missing something obvious again, but where does it say that? Bear in mind that my simple NULL could be an arbitrarily complex expression, I've just chosen a simple NULL for pedagogic reasons. I can only see a requirement that the chosen type must be compatible. That seems to leave it open to arbitrarily choosing any type in this case. -- Sam http://samason.me.uk/
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Yep. I don't know if it would be remotely feasible, but the >> implementation which seems like it would be "standard-safe" but >> still give reasonable concessions to those wanting to skip the >> extra keystrokes of declaring the type of literals which are not >> character based would be to go with the suggestion of having a >> character string literal type, and change the semantics such that >> if there is a valid interpretation of the statement with the >> character string literal taken as text, it should be used; if not, >> resolve by current "unknown" rules. > > There is already a weak preference for resolving unknown as text in > the presence of multiple alternatives. So I'm not sure that you're > suggesting anything different from what happens now. In particular, > weren't you the same person complaining a moment ago about > COALESCE(NULL,NULL) defaulting to text? Why is that bad if the > above is good? Because COALESCE(NULL, NULL) has given no indication that it is character based, while 'x' is defined by the standard to be a character string literal. The two uses of "unknown" in typing seem to be solving different problems. Perhaps using the same flag for both is part of the problem. The above was suggesting we differentiate. -Kevin
Sam Mason <sam@samason.me.uk> wrote: > On Wed, Sep 02, 2009 at 02:59:54PM -0500, Kevin Grittner wrote: >> Sam Mason <sam@samason.me.uk> wrote: >> > CREATE VIEW v (c) AS >> > SELECT NULL; >> > >> > PG allows it, but the resulting view seems somewhat unusable. >> >> I'm not sure whether the only place the standard doesn't require a >> cast is on assignment, but this is one place that the standard >> clearly does require a cast, and I'm all for that. > > I'm probably missing something obvious again, but where does it say > that? Bear in mind that my simple NULL could be an arbitrarily > complex expression, I've just chosen a simple NULL for pedagogic > reasons. > > I can only see a requirement that the chosen type must be > compatible. That seems to leave it open to arbitrarily choosing any > type in this case. I'll look. On reflection, I based that statement on the fact that our SQL parser requires it, and it was largely created by scanning in the SQL syntax rules, using OCR on the image, and sed to format them for ANTLR. That doesn't guarantee that we didn't add that particular requirement for the sake of our own sanity some time later. I doubt I can get to looking through the standard for it today, though. By the way, the case of a bare NULL literal is potentially very different from a complex expression which might resolve to NULL. The latter would generally have some result type which could be determined even in tha absence of an actual value. For example, a null-capable column clearly has a type which can be determined. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> go with the suggestion of having a character string literal type, >> and change the semantics such that if there is a valid >> interpretation of the statement with the character string literal >> taken as text, it should be used; if not, resolve by current >> "unknown" rules. > > There is already a weak preference for resolving unknown as text in > the presence of multiple alternatives. So I'm not sure that you're > suggesting anything different from what happens now. It is certainly different for the example I gave up-thread involving "char". Perhaps that is a very unique and isolated situation. -Kevin
Sam Mason <sam@samason.me.uk> writes: > On Wed, Sep 02, 2009 at 02:59:54PM -0500, Kevin Grittner wrote: >> I'm not sure whether the only place the standard doesn't require a >> cast is on assignment, but this is one place that the standard clearly >> does require a cast, and I'm all for that. > I'm probably missing something obvious again, but where does it say > that? They express it as a syntactic constraint not a semantic one. NULL without decoration is a <contextually typed value specification> which is allowed in only a limited number of places. One place it's allowed is inside a <cast specification>, which is reachable from <value expression>. The latter is allowed in lots of places. regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Because COALESCE(NULL, NULL) has given no indication that it is > character based, while 'x' is defined by the standard to be a > character string literal. The two uses of "unknown" in typing seem to > be solving different problems. Perhaps using the same flag for both > is part of the problem. The above was suggesting we differentiate. I've always thought that having the type system treat NULL and 'foo' alike was a feature, not a bug. In a formal sense the type information available is the same, ie, none. The argument that SQL says 'foo' must be character, so we should too, is greatly weakened by the fact that SQL has such an impoverished set of built-in types. If we want to treat user-defined types as anything approaching first-class types, we have to be pretty suspicious of that restriction. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > In a formal sense the type information available is the same, ie, > none. Well, in the sense that an international standard is formal, there is a formal difference, in that one has no type information and the other is a character string. However: > The argument that SQL says 'foo' must be character, so we should > too, is greatly weakened by the fact that SQL has such an > impoverished set of built-in types. If we want to treat > user-defined types as anything approaching first-class types, we > have to be pretty suspicious of that restriction. Another big clue for me in terms of the community perspective on this. Thanks. I think that the current approach leaves a small number of corner cases where we break SQL compliance. I think it's worthwhile trying to fix that. Whether that's best done by identifying the individual corners and fixing them independently as aberrations, or implementing some changes which provide the PostgreSQL extensions in a way that doesn't tend to break standard usage (and of course has little or no impact on current PostgreSQL users), is beyond my ken. I'm also not suggesting that this is the most urgent issue around. If anyone can suggest an appropriate wording for a TODO on the topic, I'll happily shut up and move on.... ;-) -Kevin
So one of the elephants in the room in this (rather dead-end) discussion is that one of the things "unknown" is good for is the fact that most clients don't bind their parameter types to specific types. Doing so is extremely cumbersome in just about every interface because it forces you to think about SQL types and look up constants for every parameter type. It's even worse if you have user-defined types on the server and have to start figuring out how to look these up dynamically. We use unknown to normally dtrt when a client passes a text literal representation without forcing them to tell us what type to interpret it as. Most client interfaces can just leave every parameter set to type unknown and let Postgres figure out what to do with everything. However it occurs to me that that doesn't work very well for substring(). If your client interface doesn't implicitly bind the second argument to integer it'll be interpreted as text by default and you get what is usually going to not be what you want; postgres=# select substring('foobar456',4); substring ----------- bar456 (1 row) postgres=# select substring('foobar456','4'); substring ----------- 4 (1 row) This for example makes it awkward to use from Perl: $ perl -e 'use Data::Dumper; use DBI; use DBD::Pg; $d = DBI->connect("dbi:Pg:database=postgres;host=localhost;port=1234"); print Dumper $d->selectall_arrayref(q{select substring('\''foobar'\'',?)},{},'4');' $VAR1 = [ [ undef ] ]; $ perl -e 'use Data::Dumper; use DBI; use DBD::Pg; $d = DBI->connect("dbi:Pg:database=postgres;host=localhost;port=1234"); print Dumper $d->selectall_arrayref(q{select substring('\''foobar'\'',?)},{},4);' $VAR1 = [ [ undef ] ];
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I think that the current approach leaves a small number of corner > cases where we break SQL compliance. I think it's worthwhile trying > to fix that. Whether that's best done by identifying the individual > corners and fixing them independently as aberrations, or implementing > some changes which provide the PostgreSQL extensions in a way that > doesn't tend to break standard usage (and of course has little or no > impact on current PostgreSQL users), is beyond my ken. It's certainly worth looking into. I would not want to prejudge how to fix such issues; I think the best way to proceed would be to create a list of the problems first. One other point worth making is that we don't always consider SQL compliance to be a hard requirement that trumps every other consideration. An example is case-folding of identifiers; it's been pretty well agreed that between readability and backwards- compatibility considerations, we simply aren't going to switch over to doing it exactly like the spec. So any proposed tweaks in this area would be considered as tradeoffs between better spec compliance and other goals. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > One other point worth making is that we don't always consider SQL > compliance to be a hard requirement that trumps every other > consideration. Point noted. > An example is case-folding of identifiers; it's been pretty well > agreed that between readability and backwards-compatibility > considerations, we simply aren't going to switch over to doing it > exactly like the spec. I've left that one alone both because I saw the discussions of it and because our framework automatically fixes identifier capitalization based on our metadata and then wraps the all identifiers in quotes. As long as PostgreSQL honors the rules about quoted identifiers, we're golden. :-) (Although, one of these days I may try to do something about how psql, etc. treat identifiers which are all lower case. The refusal to consider the quoted form a match during tab-completion, for example, is a regular annoyance.) > So any proposed tweaks in this area would be considered as tradeoffs > between better spec compliance and other goals. Fair enough. I consider myself warned. ;-) -Kevin
On Wed, Sep 2, 2009 at 3:34 PM, Sam Mason<sam@samason.me.uk> wrote: > On Wed, Sep 02, 2009 at 12:54:03PM -0500, Kevin Grittner wrote: >> Sam Mason <sam@samason.me.uk> wrote: >> > If we did follow Kevin's request directly, should we also be >> > specifying the type of NULL? >> >> I don't *think* the SQL standard requires that, and barring that I >> don't see any compelling reason to type NULL. > > I've just realized that either I'm missing your point entirely (it's > happened before :) or this ignores the point entirely. =A0PG wants to > assign types to every expression, whether this expression will evaluate > to a NULL value at run-time or not is immaterial in this regard. =A0I > think SQL wants to do the same, but I don't have as much conviction as > Tom here. =A0Once we're ascribing types to expressions then whether it > happens to contain the literal "1", "'txt'" or "NULL" we're committed to > giving it some type---the only question is which one. =A0We thus need to > type expressions consisting of just NULL constants. > > A fun puzzle to base any inductive solution on is what type to ascribe > to the following: > > =A0CREATE VIEW v (c) AS > =A0 =A0SELECT NULL; 'a of course. ...Robert
Sam Mason <sam@samason.me.uk> writes: > Would something like the included patch be accepted? > [ patch to make charin() throw error for excess input ] I remembered the reason why this idea got rejected previously. The winning argument was that "char" should behave as nearly as possible like the standard type char(1). And guess what, that truncates silently: regression=# select 'foo'::char(1); bpchar -------- f (1 row) However, I think we forgot that that is only true for an explicit coercion to char. If you do this: regression=# create table fool (f1 char(1)); CREATE TABLE regression=# insert into fool values('123'); ERROR: value too long for type character(1) (I'm too lazy to go look up chapter and verse, but all of this is per SQL standard. Consistency ain't their strong point.) I certainly don't want to have "char" emulate the misbegotten decision to have explicit and implicit coercions behave differently. So it looks to me like the argument to make "char" work like char(1) doesn't actually help us much to decide if an error should be thrown here or not. On the whole, throwing an error seems better from a usability perspective. Comments? regards, tom lane PS: I would suggest that we borrow the wording and SQLSTATE of the existing message, if we do make it throw an error.
On Fri, Sep 4, 2009 at 11:15 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I certainly don't want to have "char" emulate the misbegotten decision > to have explicit and implicit coercions behave differently. =A0So it > looks to me like the argument to make "char" work like char(1) doesn't > actually help us much to decide if an error should be thrown here or > not. =A0On the whole, throwing an error seems better from a usability > perspective. I like errors. ...Robert
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I certainly don't want to have "char" emulate the misbegotten > decision to have explicit and implicit coercions behave differently. > So it looks to me like the argument to make "char" work like char(1) > doesn't actually help us much to decide if an error should be thrown > here or not. On the whole, throwing an error seems better from a > usability perspective. I feel that the behavior of "char" in at least this case should match char(1) (or just plain char): test=# select case when true then 'xxx' else 'a'::"char" end from t; case ------ x (1 row) test=# select case when true then 'xxx' else 'a'::char(1) end from t; case ------ xxx (1 row) test=# select case when true then 'xxx' else 'a'::char end from t; case ------ xxx (1 row) Much as the reason for the behavior of "char" may seem clear when inside the code looking out, it is astonishing for someone writing application code. -Kevin
On Fri, Sep 4, 2009 at 11:59 AM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I certainly don't want to have "char" emulate the misbegotten >> decision to have explicit and implicit coercions behave differently. >> So it looks to me like the argument to make "char" work like char(1) >> doesn't actually help us much to decide if an error should be thrown >> here or not. =A0On the whole, throwing an error seems better from a >> usability perspective. > > I feel that the behavior of "char" in at least this case should match > char(1) (or just plain char): > > test=3D# select case when true then 'xxx' else 'a'::"char" end from t; > =A0case > ------ > =A0x > (1 row) > > test=3D# select case when true then 'xxx' else 'a'::char(1) end from t; > =A0case > ------ > =A0xxx > (1 row) > > test=3D# select case when true then 'xxx' else 'a'::char end from t; > =A0case > ------ > =A0xxx > (1 row) > > Much as the reason for the behavior of "char" may seem clear when > inside the code looking out, it is astonishing for someone writing > application code. > > -Kevin Yeah, I agree. That's really confusing. ...Robert
On Fri, Sep 04, 2009 at 10:59:48AM -0500, Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I certainly don't want to have "char" emulate the misbegotten > > decision to have explicit and implicit coercions behave differently. > > So it looks to me like the argument to make "char" work like char(1) > > doesn't actually help us much to decide if an error should be thrown > > here or not. On the whole, throwing an error seems better from a > > usability perspective. I'm all for the error being thrown; no particular feelings about whether it only happens during "explicit" casts or everywhere. There's always the substring() function if the user wants it. > I feel that the behavior of "char" in at least this case should match > char(1) (or just plain char): Hum, I'm not sure if that's useful behavior. As far as I can tell, you seem to be wanting in-memory representations of "string like types" to all use the same representation and only use the actual types when saving to/from disk. This would give behavior that is consistent with what you're for asking below. > test=# select case when true then 'xxx' else 'a'::"char" end from t; > case > ------ > x > (1 row) With the patch I gave, or something like it, this would throw an error because 'xxx' is being used to initialize a value of "char" type. > test=# select case when true then 'xxx' else 'a'::char(1) end from t; > case > ------ > xxx > (1 row) This gives back 'xxx' because the types character and bpchar have an implicit cast defined between them. The result is thus of type bpchar which places no restrictions on the length (that would apply here anyway), with 'a' being cast from character to bpchar implicitly. > test=# select case when true then 'xxx' else 'a'::char end from t; > case > ------ > xxx > (1 row) This does the same as above. > Much as the reason for the behavior of "char" may seem clear when > inside the code looking out, it is astonishing for someone writing > application code. I think things would be clearer if an error was thrown in the constructor of "char" types when the length wasn't correct. Implicit casts are muddying the waters as well, which doesn't aid understanding of what's going on here. -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> wrote: > you seem to be wanting in-memory representations of "string like > types" to all use the same representation and only use the actual > types when saving to/from disk. Doing so when actually assigning to *something* of the more specific type would probably be better. In my view, that's not happening here; although I see that the code currently doesn't recognize the difference. >> test=# select case when true then 'xxx' else 'a'::"char" end from t; >> case >> ------ >> x >> (1 row) > > With the patch I gave, or something like it, this would throw an > error because 'xxx' is being used to initialize a value of "char" > type. As I read the semantics of the CASE predicate, it returns one of the given values. 'x' is not one of the given values, regardless of type. I don't think an error is the right thing, I think returning the specified value is the right thing. I don't think it's a good thing that the type system decides that the result type for this case predicate is "char" and that 'xxx' needs to be coerced to that type. -Kevin
On Fri, 2009-09-04 at 11:15 -0400, Tom Lane wrote: > On the whole, throwing an error seems better from a usability > perspective. > > Comments? Agreed. Regards, Jeff Davis
On Fri, Sep 04, 2009 at 12:26:19PM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > Kevin Grittner wrote: > >> test=# select case when true then 'xxx' else 'a'::"char" end from t; > > > > 'xxx' is being used to initialize a value of "char" type. > > As I read the semantics of the CASE predicate, it returns one of the > given values. 'x' is not one of the given values, regardless of type. You seem to be confused about the difference between literals and values. Maybe a different example: SELECT '00001'::int; I get '1' back from that, and not '00001'. This is because '00001' is the literal that is parsed into a value of type integer and then the query is run and this same value is asked to convert itself back into a literal to be written out to the screen. Back to your example; you're asking PG to interpret the literal 'xxx' as a "char" and it does that (but doesn't give any error back when it chucks data away). This behavior may be confusing because for text types the literal exactly the same as the value itself, but this is only a very specific behavior of text types. For example, '{"1"}', '{1}' and even '{"+001"}' are all literal representations of identical integer arrays. > I don't think an error is the right thing, I think returning the > specified value is the right thing. I don't think it's a good thing > that the type system decides that the result type for this case > predicate is "char" and that 'xxx' needs to be coerced to that type. I fail to see how an error isn't the right thing; if we try with some other types let see if you think any of these should succeed. SELECT CASE WHEN TRUE THEN text 'xxx' ELSE 0 END; SELECT CASE WHEN TRUE THEN text 'xxx' ELSE TRUE END; SELECT CASE WHEN TRUE THEN text 'xxx' ELSE '{1}'::INT[] END; SELECT CASE WHEN TRUE THEN text 'xxx' ELSE array [1] END; "char" is no different other than, by default, it happens to look a lot like any value of text type. It's a different type (that happens to have some implicit casts to confuse things) and hence I can't see why invalid literals should not be thrown out. -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> wrote: > I fail to see how an error isn't the right thing; if we try with > some other types let see if you think any of these should succeed. > > SELECT CASE WHEN TRUE THEN text 'xxx' ELSE 0 END; > SELECT CASE WHEN TRUE THEN text 'xxx' ELSE TRUE END; > SELECT CASE WHEN TRUE THEN text 'xxx' ELSE '{1}'::INT[] END; > SELECT CASE WHEN TRUE THEN text 'xxx' ELSE array [1] END; > > "char" is no different other than, by default, it happens to look a > lot like any value of text type. So much so that it has the same name as a text type (wrapped in quotes) and behaves a lot like one: test=# SELECT CASE WHEN TRUE THEN text 'xxx' ELSE 'a'::"char" END; case ------ xxx (1 row) test=# select upper('a'::"char"); upper ------- A (1 row) test=# select char_length('a'::"char"); char_length ------------- 1 (1 row) test=# select substring('a'::"char" from 1 for 1); substring ----------- a (1 row) Making it behave so much like character-based types and giving it a name which implies that it is character based and storing a character in it, but then not treating it like other character types in the CASE context is bound to cause surprises for people. > It's a different type (that happens to have some implicit casts to > confuse things) and hence I can't see why invalid literals should > not be thrown out. Only, I guess, because of the name. If it weren't called "char" I guess I wouldn't be concerned about people expecting it to behave something like char. If "char" behaved more like char, the 'xxx' literal wouldn't be taken as input to the type in the above CASE statement. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Only, I guess, because of the name. If it weren't called "char" I > guess I wouldn't be concerned about people expecting it to behave > something like char. If "char" behaved more like char, the 'xxx' > literal wouldn't be taken as input to the type in the above CASE > statement. I'm not certain what you're trying to say, but the above is complete nonsense ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not certain what you're trying to say, but the above is complete > nonsense ... Ah, so it is. I thought someone up-thread said that in this case it wound up as bpchar; but I see that's not so: test=# select pg_typeof((select case when true then 'xxx' else 'a'::char(1) end)); pg_typeof ----------- character (1 row) All that's done is to strip off the length. I guess that since the "char" type is documented as being for internal use, these issues would only affect those who choose to write queries against catalog tables or use an internal type in their tables, so I guess it's not worth going to extremes to make it behave like char. Given all that, I'll conceed the point, and give a +1 for the error message. -Kevin
On Fri, Sep 04, 2009 at 02:01:07PM -0500, Kevin Grittner wrote: > > "char" is no different other than, by default, it happens to look a > > lot like any value of text type. > > So much so that it has the same name as a text type (wrapped in > quotes) and behaves a lot like one: You're getting bitten by implicit casts. Try creating a view of any of your examples and seeing what the resulting types are, or in 8.4 an EXPLAIN VERBOSE seems to show how it's typed the expression. -- Sam http://samason.me.uk/