Thread: Parser bug (?)
Hi Tom, This looks like a parser bug in 6.4. I found it while trying to run pg_upgrade: bray=> create table junk ("singleton" "char"); CREATE bray=> drop table junk; DROP bray=> create table junk ("singleton" "char" default 'M'); ERROR: DEFAULT: const type mismatched bray=> create table junk ("singleton" char default 'M'); CREATE pg_dump puts all type names in double quotes, which triggers this bug when pg_upgrade is run, with the result that the upgrade fails. [Machine is i686 Linux 2.1.125 with glibc 2.0.7u] -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID32B8FAA1 ======================================== "Go ye therefore, and teach all nations, baptizingthem in the name of the Father, and of the Son, and of the Holy Ghost; Teaching them to observe all things whatsoever I have commanded you; and, lo, I am with you alway, even unto the end of the world. Amen." Matthew 28:19,20
"Oliver Elphick" <olly@lfix.co.uk> writes: > This looks like a parser bug in 6.4. I found it while trying to run > pg_upgrade: > bray=> create table junk ("singleton" "char"); > CREATE > bray=> drop table junk; > DROP > bray=> create table junk ("singleton" "char" default 'M'); > ERROR: DEFAULT: const type mismatched > bray=> create table junk ("singleton" char default 'M'); > CREATE Oh my, that's interesting ... and it looks like it is closely related to my gripe a few days ago that type char is actually implemented as char(1), or more accurately type bpchar(1). I didCREATE TABLE t1 (c1 char, c2 "char"); and then looked at the info for the relation in pg_attribute: attname|atttypid|attlen|attnum|attnelems|atttypmod|attbyval|attalign -------+--------+------+------+---------+---------+--------+-------- c1 | 1042| -1| 1| 0| 5|f |i c2 | 18| 1| 2| 0| -1|t |c pg_type contains oid|typname|typowner|typlen|typprtlen|typbyval|typtype|typisdefined|typdelim|typrelid|typelem|typinput|typoutput|typreceive|typsend |typalign|typdefault ----+-------+--------+------+---------+--------+-------+------------+--------+--------+-------+--------+---------+----------+---------+--------+---------- 18|char | 256| 1| 1|t |b |t |, | 0| 0|charin |charout |charin |charout |c | 1042|bpchar | 256| -1| -1|f |b |t |, | 0| 18|bpcharin|bpcharout|bpcharin |bpcharout|i | So it appears that *something* (probably the parser) is converting the unquoted type name char to a bpchar (blank-padded char array), but if it's quoted then you actually get the primitive char type. Seems to me that that's a bug: if you ask for char, and not char(1), you should get char. At least I'd like it to be a bug, because I want to be able to use the unvarnished char type, and I don't want to depend on some weird quoting behavior to get at it. The second bug exhibited by Oliver's example is that a literal 'M' is not recognized as compatible with the primitive char type. That won't do either. I tried coercing the 'M' to a char explicitly, just to see what would happen, and got a coredump: create table junk ("singleton" "char" default 'M'::char); pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally beforeor while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. Backtrace is #0 0x800c4234 in _strlen () #1 0xca87c in FlattenStringList (list=0x4008dd58) at gram.y:5047 #2 0xc6380 in yyparse () at gram.y:845 #3 0xcac8c in parser ( str=0x7b034218 "create table junk (\"singleton\" \"char\" default 'M'::char);", typev=0x0, nargs=0)at parser.c:53 #4 0xf1c0c in pg_parse_and_plan ( query_string=0x7b034218 "create table junk (\"singleton\" \"char\" default 'M'::char);",typev=0x7b038550, nargs=0, queryListP=0x7b036358, dest=Remote, aclOverride=0) at postgres.c:420 Ooops. Looks like FlattenStringList is getting handed a node type it was not expecting. The elements of the list it is handed are: $6 = {type = T_String, val = {str = 0x41b6c "CAST", ival = 269164, dval = 5.7116487507937656e-309}} $8 = {type = T_String, val = {str = 0x4008d910 "'M'", ival = 1074321680, dval = 3.105987548828125}} $10 = {type = T_String, val = {str = 0x41b74 "AS", ival = 269172, dval = 5.7118185104570428e-309}} $12 = {type = T_TypeName, val = {str = 0x5 <Address 0x5 out of bounds>, ival = 5, dval = 1.0609978954826362e-313}} So that's another bug, though possibly unrelated. regards, tom lane
> Oh my, that's interesting ... and it looks like it is closely related > to my gripe a few days ago that type char is actually implemented as > char(1), or more accurately type bpchar(1). That's a feature! Sorry I missed your gripe session, but it apparently flew by in the e-mail and I didn't spot it. Do you want to dredge it up again, or did one gripe get you feeling better? > So it appears that *something* (probably the parser) is converting the > unquoted type name char to a bpchar (blank-padded char array), > but if it's quoted then you actually get the primitive char type. > Seems to me that that's a bug: if you ask for char, and not char(1), > you should get char. I'm pretty sure that any benefit to a visible "unvarnished char" are outweighed by the burden of fully implementing all of the "uvchar" support functions. It would need to be entirely transparent when mixed with char(), varchar(), and text types, and it probably isn't at the moment. As a reminder to others (as I'm sure you are already aware of this, but want the feature anyway :), an apparent space savings of 4 bytes (1 bytes vs 5 bytes) for char vs char(1) is not the 80% savings it appears; there is a large per-tuple overhead that pretty much swamps any tiny data type. > The second bug exhibited by Oliver's example is that a literal 'M' > is not recognized as compatible with the primitive char type. > That won't do either. > I tried coercing the 'M' to a char explicitly, just to see what would > happen, and got a coredump: > So that's another bug, though possibly unrelated. We should continue the discussion before I try for a fix, to make sure that you don't get railroaded into *never* being able to use "uvchar" ;) - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: >> ... my gripe a few days ago that type char is actually implemented as >> char(1), or more accurately type bpchar(1). > That's a feature! Sorry I missed your gripe session, but it apparently > flew by in the e-mail and I didn't spot it. I think I sent it to pgsql-sql not the hackers list. > Do you want to dredge it up again, or did one gripe get you feeling > better? No, I'm still unhappy. Basically, what I'm using "plain char" for is as a poor man's enumerated type. I have a lot of fields that have only half a dozen legal values, and what I do is assign somewhat-mnemonic character codes to each of the allowed values. This is compact, easy to process, and I can examine the raw data at need without straining my tired brain cells very far to remember what the codes stand for. (It'd be even better to have a *real* enumerated-type facility, I suppose, but this works fine and doesn't take a major amount of work to implement...) Only problem is that the fields aren't nearly as compact as I thought they were. > As a reminder to others (as I'm sure you are already aware of this, but > want the feature anyway :), an apparent space savings of 4 bytes (1 > bytes vs 5 bytes) for char vs char(1) is not the 80% savings it appears; You forgot the 4-byte alignment requirement of char(n). These things really take 8 bytes each, not the 1 byte each that several "uvchar"s in a row would take. > there is a large per-tuple overhead that pretty much swamps any tiny > data type. When you have half a dozen of these per tuple (which I do), it starts to add up. Besides, the per-tuple overhead has redeeming social value --- all those fields have a defensible reason for being there. A 700% storage overhead with no defensible purpose is harder to swallow. > I'm pretty sure that any benefit to a visible "unvarnished char" are > outweighed by the burden of fully implementing all of the "uvchar" > support functions. It would need to be entirely transparent when mixed > with char(), varchar(), and text types, and it probably isn't at the > moment. Well, all I really give a darn about is whether I can insert, update, and select on equality. (Indeed, since I'm using these as enums, I have no need whatever for them to be transparently interchangeable with char-string types. However, I do want them to display as "char" and not "int1", since small integer codes would lose the mnemonic aspect...) A quick look at pg_operator.h indicates that char is reasonably well supplied with functions, anyway. Once I found out that I could get to "uvchar" with this little quoting hack, I satisfied myself that it does everything I need done. But I don't care for relying on what is presumably just a parser bug to get at a fundamental data type. I didn't object to removing char2, char4, and the other fixed-width special character types, but I think that plain char has applications that justify leaving it in there. I'm willing to consider giving it a different name if you really insist that char should be char(1) (though I don't agree with that). regards, tom lane
All points are well taken, but I will object to the "700%" number. Still misleading imho. But that's not the point... > I didn't object to removing char2, char4, and the other fixed-width > special character types, but I think that plain char has applications > that justify leaving it in there. I'm willing to consider giving > it a different name if you really insist that char should be char(1) > (though I don't agree with that). The SQL92 standard sez that char is shorthand for char(1). Trying to make the behavior of char vs char() *completely* transparent as called for in the standard is one reason why I squashed them together from a user's point of view. I only left char in the mix because it is used internally in Postgres system tables and I didn't want to open that can of worms (and don't intend to, so don't panic). I shouldn't hide behind the SQL92 standards argument, since there are some parts of the standard which are so hideous that they should be ignored. But I'm not certain this is one of them. Another detail: we would need to figure out how to do locale and multibyte support for this "char" type to allow equivalence with multibyte char(1). Not sure how to do that since this type *is* used internally and probably can't be resized like that. Speaking of which, are you or Bruce (or anyone else) thinking of testing the unsigned int vs size_t for the Size typedef recently mentioned by Oliver? Can we count on all platforms to have size_t defined? If so, we could consider adding it in for v6.4.1. I'm currently working on getting min() and max() to work with string types, but could drop that to polish things for a v6.4.1 release. - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > All points are well taken, but I will object to the "700%" number. Still > misleading imho. OK, you understated for effect, I overstated for effect, we're even ;-) > The SQL92 standard sez that char is shorthand for char(1). Um. Well, what do you think of calling it "char1" or some such? > I only left char in the mix because it is used > internally in Postgres system tables and I didn't want to open that can > of worms (and don't intend to, so don't panic). Now that you mention it, I've noticed a few places where plain char is used in the system tables in *exactly* the way I'm talking about, ie, as a simple form of enumerated type. For example, the typtype and typalign fields in pg_type. All I'm asking for is a reliable way to get at that same functionality in a user table. > Another detail: we would need to figure out how to do locale and > multibyte support for this "char" type to allow equivalence with > multibyte char(1). Not sure how to do that since this type *is* used > internally and probably can't be resized like that. Urgh. Multibyte char support is the *last* thing I'm looking for in this context. How about we rename the type "justoneplainvanillaasciichar" and have done with it ;-) ? [ caution, topic drift ahead ] > Speaking of which, are you or Bruce (or anyone else) thinking of testing > the unsigned int vs size_t for the Size typedef recently mentioned by > Oliver? Can we count on all platforms to have size_t defined? I was looking at that. size_t exists on all platforms, the trouble is that pre-ANSI platforms are not very consistent about exactly which system header file(s) define it. If we make c.h (and c.h, not postgres.h, is what defines Size) depend on size_t then we may find a few compilation failures due to .c files not pulling in all the right system headers before including c.h. On the other hand, c.h unconditionally requires <stdlib.h> which itself is an ANSI-ism. And stdlib is one of the headers that ANSI specifies to define size_t --- so any ANSI-compliant header fileset *will* support this change. It's only not-quite-ANSI systems that we risk problems with here. So probably I'm being too conservative to worry at all. If you don't have a reasonably ANSI-conformant compiler and header fileset you're gonna have a heck of an unpleasant time building Postgres anyway, I suspect. c.h says that Size is intended to represent the result type of sizeof, and that most certainly is size_t, *not* any other type, according to the ANSI spec. So if Size is being used in the code to represent the size of in-memory objects then it definitely ought to be size_t. In theory this change is absolutely correct, and my guess is we should do it. But if we see a few glitches on obsolete platforms, don't say I didn't warn you ;-). regards, tom lane
> Speaking of which, are you or Bruce (or anyone else) thinking of testing > the unsigned int vs size_t for the Size typedef recently mentioned by > Oliver? Can we count on all platforms to have size_t defined? If so, we > could consider adding it in for v6.4.1. I'm currently working on getting > min() and max() to work with string types, but could drop that to polish > things for a v6.4.1 release. I think everyone has size_t. It is s POSIX requirement. Sounds like a good change. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I think everyone has size_t. It is s POSIX requirement. Sounds like a > good change. Tom Lane points out that some of our more esoteric platforms may be missing it, so it sounds like it should be an optional v6.4.1 patch and a v6.5 built-in feature (to allow platform testing). Who is testing now? - Tom
Is this fixed? > Hi Tom, > > This looks like a parser bug in 6.4. I found it while trying to run > pg_upgrade: > > bray=> create table junk ("singleton" "char"); > CREATE > bray=> drop table junk; > DROP > bray=> create table junk ("singleton" "char" default 'M'); > ERROR: DEFAULT: const type mismatched > bray=> create table junk ("singleton" char default 'M'); > CREATE > > pg_dump puts all type names in double quotes, which triggers this bug > when pg_upgrade is run, with the result that the upgrade fails. > > [Machine is i686 Linux 2.1.125 with glibc 2.0.7u] > > -- > Oliver Elphick Oliver.Elphick@lfix.co.uk > Isle of Wight http://www.lfix.co.uk/oliver > PGP key from public servers; key ID 32B8FAA1 > ======================================== > "Go ye therefore, and teach all nations, baptizing them > in the name of the Father, and of the Son, and of the > Holy Ghost; Teaching them to observe all things > whatsoever I have commanded you; and, lo, I am with > you alway, even unto the end of the world. Amen." > Matthew 28:19,20 > > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Speaking of which, are you or Bruce (or anyone else) thinking of testing > the unsigned int vs size_t for the Size typedef recently mentioned by > Oliver? Can we count on all platforms to have size_t defined? If so, we > could consider adding it in for v6.4.1. I'm currently working on getting > min() and max() to work with string types, but could drop that to polish > things for a v6.4.1 release. Added to current tree. 6.4.1 tree too likely to break. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Is this fixed? No, not yet. Put it on the "show stopper" list for v6.4.1... The explanation is that CHAR is a "keyword" so is passed from the scanner to the parser explicitly, which then makes sure that the various legal forms of char type declarations get translated into the internal bpchar type. However, tokens surrounded by double quotes are passed as IDENTs, which bypasses this mechanism completely. I'll look at modifying pg_dump to leave out the double quotes on type fields unless there is mixed or upper case. Also, I'll look at having some automatic conversions between bpchar and char (though this might not be available for v6.4.1 since it may require catalog changes). - Tom > > This looks like a parser bug in 6.4. I found it while trying to run > > pg_upgrade: > > bray=> create table junk ("singleton" "char" default 'M'); > > ERROR: DEFAULT: const type mismatched > > pg_dump puts all type names in double quotes, which triggers this > > bug when pg_upgrade is run, with the result that the upgrade fails.