Thread: Re: [BUGS] Bug #513: union all changes char(3) column definition
Rae Stiening (stiening@cannon.astro.umass.edu) writes: > This script demonstrates the incorrect restoration of a > table created by a union under postgresql version 7.1.3. What's really going on here is that 1. The CREATE TABLE AS command creates a column with type bpchar and typmod -1 (ie, no specific length enforced). 2. pg_dump dumps this column with the type identified as "character". 3. On reload, "character" is interpreted as "character(1)". While each of these behaviors is justifiable to some degree when considered by itself, their interaction is not good. It is actually not possible for pg_dump to dump this table correctly, because there is no CREATE TABLE command it can give to reproduce the type/typmod combination. I thought a little bit about trying to disallow the creation of such tables, but I don't believe that can work in the general case. CREATE TABLE AS cannot be expected to be able to extract a suitable typmod from complex expressions. We could think about replacing "bpchar/-1" with "text", but that only fixes the problem for bpchar; we have the exact same issue with numeric, and there is no comparable workaround for numeric. So I think what we need to do is rejigger the type display and entry rules so that there is a recognized representation for "bpchar with no typmod", "numeric with no typmod", etc, and the parser will not bogusly insert default length limits when it sees this representation. For char I propose that this representation be bpchar ie the underlying type name. This is a bit ugly, but since the notion of char(n) with no particular limit is definitely non-SQL92 anyway, using a non-SQL name seems appropriate. For varchar, it already works to write any of varchar char varying character varying This does not conflict with SQL92 since the standard doesn't allow the length spec to be omitted in these types, and so there's not an expected default of 1 as there is for char. For numeric, we could say that the representation for typmod -1 is "numeric" (double quotes required) ... but I really wonder why we have the convention that numeric defaults to numeric(30,6) in the first place. Why shouldn't the default behavior be to use typmod -1 (no limit)? The (30,6) convention cannot be justified on the basis of the SQL spec; it says the default precision is implementation-defined and the default scale is zero. "No limit" looks like a good enough implementation-defined precision value to me, and as for the scale, defaulting to no scale adjustment is less likely to make anyone unhappy than defaulting to scale zero. So I propose that we remove all notion of a default precision and scale for numeric, and say that numeric written without a precision/scale spec means numeric with typmod -1. For bit, the SQL spec requires us to interpret unadorned bit as meaning bit(1), so there seems little choice but to use "bit" (quotes required) for the typmod -1 case. For varbit, "bit varying" already works and need not be messed with; same rationale as for varchar. As far as implementation goes, on the output side all that's needed is some simple changes in format_type to emit the desired representation. In the parser, we need to remove transformColumnType's diddling of typmod and instead insert the correct default typmod in gram.y. We can't do it later than gram.y since the distinction between "bit" and bit, etc, is not visible later. Comments? regards, tom lane
Tom Lane writes: > CREATE TABLE AS cannot be expected to be able to extract a suitable > typmod from complex expressions. I don't think that would be entirely unreasonable. The current system drops typmods at first sight when it gets scared of them, but in many cases it would make sense for them to propagate much further. We've already seen a case where "no typmod" means different things in different places for lack of a good way to keep the information. If we ever want to allow user-defined data types to have atttypmods a solution would be necessary. Here's another example where the behaviour is not consistent with other places: peter=# create table one (a bit(4)); CREATE peter=# create table two (b bit(6)); CREATE peter=# insert into one values (b'1001'); INSERT 16570 1 peter=# insert into two values (b'011110'); INSERT 16571 1 peter=# select * from one union select * from two; 011110 1001 What's the data type of that? The fact is that bit without typmod makes no sense, even less so than char without typmod. A possible solution would be that data types can register a typmod-resolver function, which takes two typmods and returns the typmod to make both expressions union-compatible. For varchar(n) and varchar(m) is would return max(m,n), for bit(n) and bit(m) it would return an error if m<>n. (The behaviour of char() could be either of these two.) Surely a long-term idea though... -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> CREATE TABLE AS cannot be expected to be able to extract a suitable >> typmod from complex expressions. > I don't think that would be entirely unreasonable. Well, it might not be completely impossible, but I think it's well on the far side of unreasonable. For *every operator* that produces a result of any of the typmod-using types, we'd have to maintain an auxiliary bit of code that can predict the result typmod. That's a lot of code, and when you start considering user-defined functions it gets worse. And all for what? Not to do anything useful, but only to *eliminate* functionality. Perhaps char without typmod is unnecessary (since it reduces to text), but numeric without typmod seems highly useful to me. Strikes me as a very large amount of work to go in the wrong direction... > A possible solution would be that data types can register a > typmod-resolver function, which takes two typmods and returns the typmod > to make both expressions union-compatible. This only handles the UNION and CASE merge scenarios. It'd probably be reasonable for UNION/CASE to copy the input typmod if the alternatives all agree on the type and typmod. But solving the general problem would be a lot of work of dubious value. regards, tom lane
> Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > >> CREATE TABLE AS cannot be expected to be able to extract a suitable > >> typmod from complex expressions. > > > I don't think that would be entirely unreasonable. > > Well, it might not be completely impossible, but I think it's well on > the far side of unreasonable. For *every operator* that produces a > result of any of the typmod-using types, we'd have to maintain an > auxiliary bit of code that can predict the result typmod. That's > a lot of code, and when you start considering user-defined functions > it gets worse. And all for what? Not to do anything useful, but only > to *eliminate* functionality. Perhaps char without typmod is > unnecessary (since it reduces to text), but numeric without typmod seems > highly useful to me. > > Strikes me as a very large amount of work to go in the wrong > direction... Added to TODO: * CREATE TABLE AS can not determine column lengths from expressions Seems it should be documented. Do we throw an error in these cases? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Added to TODO: > * CREATE TABLE AS can not determine column lengths from expressions > Seems it should be documented. Do we throw an error in these cases? No. What we do right now is to generate non-length-constrained column types for the created table. Your TODO item is too pessimistic: we *do* determine the column length in simple cases. For example: regression=# create table foo (f1 char(3)); CREATE regression=# create table bar as select * from foo; SELECT regression=# \d bar Table "bar"Column | Type | Modifiers --------+--------------+-----------f1 | character(3) | However, in more complex cases we don't know the column length: regression=# create table baz as select f1 || 'z' as f1 from foo; SELECT regression=# \d baz Table "baz"Column | Type | Modifiers --------+--------+-----------f1 | bpchar | The argument here is about how much intelligence it's reasonable to expect the system to have. It's very clearly not feasible to derive a length limit automagically in every case. How hard should we try? regards, tom lane
> regression=# create table baz as select f1 || 'z' as f1 from foo; > SELECT > regression=# \d baz > Table "baz" > Column | Type | Modifiers > --------+--------+----------- > f1 | bpchar | > > The argument here is about how much intelligence it's reasonable to > expect the system to have. It's very clearly not feasible to derive > a length limit automagically in every case. How hard should we try? I don't think we can try in this case, especially because our functions are all burried down in adt/. However, I don't think creating a bpchar with no length is a proper solution. Should we just punt to text in these cases? Seems cleaner, perhaps even throw an elog(NOTICE) mentioning the promotion to text. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > However, I don't think creating a bpchar > with no length is a proper solution. Should we just punt to text in > these cases? How many special cases like that do you want to put into the allegedly datatype-independent CREATE TABLE code? If I thought this were the only case then I'd not object ... but it looks like a slippery slope from here. And --- it's not like replacing "bpchar" with "text" actually buys us any useful new functionality. AFAICS it's just a cosmetic thing. regards, tom lane PS: On the other hand, we might consider attacking the problem from the reverse direction, ie *removing* code. For example, if there weren't redundant || operators for char and varchar, then every || operation would yield text, and the example we're looking at would work the way you want for free. I've thought for awhile that we could use a pass through pg_proc and pg_operator to remove some entries we don't really need.
> How many special cases like that do you want to put into the allegedly > datatype-independent CREATE TABLE code? > > If I thought this were the only case then I'd not object ... but it > looks like a slippery slope from here. > > And --- it's not like replacing "bpchar" with "text" actually buys us > any useful new functionality. AFAICS it's just a cosmetic thing. > > regards, tom lane > > PS: On the other hand, we might consider attacking the problem from > the reverse direction, ie *removing* code. For example, if there > weren't redundant || operators for char and varchar, then every || > operation would yield text, and the example we're looking at would > work the way you want for free. I've thought for awhile that we > could use a pass through pg_proc and pg_operator to remove some > entries we don't really need. Can we convert bpchar to text in create table if no typmod is supplied? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane writes: > The argument here is about how much intelligence it's reasonable to > expect the system to have. It's very clearly not feasible to derive > a length limit automagically in every case. How hard should we try? I would like to know what Proprietary database #1 does with CREATE TABLE one ( a bit(6) ); INSERT INTO one VALUES ( b'101101' ); CREATE TABLE two ( b bit(4) ); INSERT INTO two VALUES ( b'0110' ); CREATE TABLE three AS SELECT a FROM one UNION SELECT b FROM two; According to SQL92, clause 9.3, the result type of the union is bit(6). However, it's not possible to store a bit(4) value into a bit(6) field. Our current solution, "bit(<nothing>)" is even worse because it has no real semantics at all (but you can store bit(<anything>) in it, interestingly). -- Peter Eisentraut peter_e@gmx.net
Thread added to TODO.detail. > Tom Lane writes: > > > The argument here is about how much intelligence it's reasonable to > > expect the system to have. It's very clearly not feasible to derive > > a length limit automagically in every case. How hard should we try? > > I would like to know what Proprietary database #1 does with > > CREATE TABLE one ( a bit(6) ); > INSERT INTO one VALUES ( b'101101' ); > CREATE TABLE two ( b bit(4) ); > INSERT INTO two VALUES ( b'0110' ); > CREATE TABLE three AS SELECT a FROM one UNION SELECT b FROM two; > > According to SQL92, clause 9.3, the result type of the union is bit(6). > However, it's not possible to store a bit(4) value into a bit(6) field. > Our current solution, "bit(<nothing>)" is even worse because it has no > real semantics at all (but you can store bit(<anything>) in it, > interestingly). > > -- > Peter Eisentraut peter_e@gmx.net > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026