Thread: Re: [BUGS] Bug #513: union all changes char(3) column definition

Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Tom Lane
Date:
Rae Stiening (stiening@cannon.astro.umass.edu) writes:
> This script demonstrates the incorrect restoration of a
> table created by a union under postgresql version 7.1.3.

What's really going on here is that

1. The CREATE TABLE AS command creates a column with type bpchar and  typmod -1 (ie, no specific length enforced).

2. pg_dump dumps this column with the type identified as "character".

3. On reload, "character" is interpreted as "character(1)".

While each of these behaviors is justifiable to some degree when
considered by itself, their interaction is not good.  It is actually
not possible for pg_dump to dump this table correctly, because there
is no CREATE TABLE command it can give to reproduce the type/typmod
combination.

I thought a little bit about trying to disallow the creation of such
tables, but I don't believe that can work in the general case.
CREATE TABLE AS cannot be expected to be able to extract a suitable
typmod from complex expressions.  We could think about replacing
"bpchar/-1" with "text", but that only fixes the problem for bpchar;
we have the exact same issue with numeric, and there is no comparable
workaround for numeric.

So I think what we need to do is rejigger the type display and entry
rules so that there is a recognized representation for "bpchar with
no typmod", "numeric with no typmod", etc, and the parser will not
bogusly insert default length limits when it sees this representation.

For char I propose that this representation be    bpchar
ie the underlying type name.  This is a bit ugly, but since the notion
of char(n) with no particular limit is definitely non-SQL92 anyway,
using a non-SQL name seems appropriate.

For varchar, it already works to write any of    varchar    char varying    character varying
This does not conflict with SQL92 since the standard doesn't allow the
length spec to be omitted in these types, and so there's not an expected
default of 1 as there is for char.

For numeric, we could say that the representation for typmod -1 is    "numeric"
(double quotes required) ... but I really wonder why we have the
convention that numeric defaults to numeric(30,6) in the first place.
Why shouldn't the default behavior be to use typmod -1 (no limit)?
The (30,6) convention cannot be justified on the basis of the SQL spec;
it says the default precision is implementation-defined and the default
scale is zero.  "No limit" looks like a good enough
implementation-defined precision value to me, and as for the scale,
defaulting to no scale adjustment is less likely to make anyone unhappy
than defaulting to scale zero.  So I propose that we remove all notion
of a default precision and scale for numeric, and say that numeric
written without a precision/scale spec means numeric with typmod -1.

For bit, the SQL spec requires us to interpret unadorned bit as meaning
bit(1), so there seems little choice but to use    "bit"
(quotes required) for the typmod -1 case.

For varbit, "bit varying" already works and need not be messed with;
same rationale as for varchar.


As far as implementation goes, on the output side all that's needed is
some simple changes in format_type to emit the desired representation.
In the parser, we need to remove transformColumnType's diddling of
typmod and instead insert the correct default typmod in gram.y.  We
can't do it later than gram.y since the distinction between "bit" and
bit, etc, is not visible later.

Comments?
        regards, tom lane


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Peter Eisentraut
Date:
Tom Lane writes:

> CREATE TABLE AS cannot be expected to be able to extract a suitable
> typmod from complex expressions.

I don't think that would be entirely unreasonable.  The current system
drops typmods at first sight when it gets scared of them, but in many
cases it would make sense for them to propagate much further.

We've already seen a case where "no typmod" means different things in
different places for lack of a good way to keep the information.  If we
ever want to allow user-defined data types to have atttypmods a solution
would be necessary.

Here's another example where the behaviour is not consistent with other
places:

peter=# create table one (a bit(4));
CREATE
peter=# create table two (b bit(6));
CREATE
peter=# insert into one values (b'1001');
INSERT 16570 1
peter=# insert into two values (b'011110');
INSERT 16571 1
peter=# select * from one union select * from two;
011110
1001

What's the data type of that?  The fact is that bit without typmod makes
no sense, even less so than char without typmod.

A possible solution would be that data types can register a
typmod-resolver function, which takes two typmods and returns the typmod
to make both expressions union-compatible.  For varchar(n) and varchar(m)
is would return max(m,n), for bit(n) and bit(m) it would return an error
if m<>n.  (The behaviour of char() could be either of these two.)

Surely a long-term idea though...

-- 
Peter Eisentraut   peter_e@gmx.net



Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> CREATE TABLE AS cannot be expected to be able to extract a suitable
>> typmod from complex expressions.

> I don't think that would be entirely unreasonable.

Well, it might not be completely impossible, but I think it's well on
the far side of unreasonable.  For *every operator* that produces a
result of any of the typmod-using types, we'd have to maintain an
auxiliary bit of code that can predict the result typmod.  That's
a lot of code, and when you start considering user-defined functions
it gets worse.  And all for what?  Not to do anything useful, but only
to *eliminate* functionality.  Perhaps char without typmod is
unnecessary (since it reduces to text), but numeric without typmod seems
highly useful to me.

Strikes me as a very large amount of work to go in the wrong
direction...

> A possible solution would be that data types can register a
> typmod-resolver function, which takes two typmods and returns the typmod
> to make both expressions union-compatible.

This only handles the UNION and CASE merge scenarios.

It'd probably be reasonable for UNION/CASE to copy the input typmod
if the alternatives all agree on the type and typmod.  But solving
the general problem would be a lot of work of dubious value.
        regards, tom lane


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Bruce Momjian
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane writes:
> >> CREATE TABLE AS cannot be expected to be able to extract a suitable
> >> typmod from complex expressions.
> 
> > I don't think that would be entirely unreasonable.
> 
> Well, it might not be completely impossible, but I think it's well on
> the far side of unreasonable.  For *every operator* that produces a
> result of any of the typmod-using types, we'd have to maintain an
> auxiliary bit of code that can predict the result typmod.  That's
> a lot of code, and when you start considering user-defined functions
> it gets worse.  And all for what?  Not to do anything useful, but only
> to *eliminate* functionality.  Perhaps char without typmod is
> unnecessary (since it reduces to text), but numeric without typmod seems
> highly useful to me.
> 
> Strikes me as a very large amount of work to go in the wrong
> direction...

Added to TODO:
* CREATE TABLE AS can not determine column lengths from expressions


Seems it should be documented.  Do we throw an error in these cases?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Added to TODO:
>     * CREATE TABLE AS can not determine column lengths from expressions
> Seems it should be documented.  Do we throw an error in these cases?

No.  What we do right now is to generate non-length-constrained column
types for the created table.

Your TODO item is too pessimistic: we *do* determine the column length
in simple cases.  For example:

regression=# create table foo (f1 char(3));
CREATE
regression=# create table bar as select * from foo;
SELECT
regression=# \d bar           Table "bar"Column |     Type     | Modifiers
--------+--------------+-----------f1     | character(3) |

However, in more complex cases we don't know the column length:

regression=# create table baz as select f1 || 'z' as f1 from foo;
SELECT
regression=# \d baz        Table "baz"Column |  Type  | Modifiers
--------+--------+-----------f1     | bpchar |

The argument here is about how much intelligence it's reasonable to
expect the system to have.  It's very clearly not feasible to derive
a length limit automagically in every case.  How hard should we try?
        regards, tom lane


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Bruce Momjian
Date:
> regression=# create table baz as select f1 || 'z' as f1 from foo;
> SELECT
> regression=# \d baz
>          Table "baz"
>  Column |  Type  | Modifiers
> --------+--------+-----------
>  f1     | bpchar |
> 
> The argument here is about how much intelligence it's reasonable to
> expect the system to have.  It's very clearly not feasible to derive
> a length limit automagically in every case.  How hard should we try?

I don't think we can try in this case, especially because our functions
are all burried down in adt/.  However, I don't think creating a bpchar
with no length is a proper solution.  Should we just punt to text in
these cases?  Seems cleaner, perhaps even throw an elog(NOTICE)
mentioning the promotion to text.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> However, I don't think creating a bpchar
> with no length is a proper solution.  Should we just punt to text in
> these cases?

How many special cases like that do you want to put into the allegedly
datatype-independent CREATE TABLE code?

If I thought this were the only case then I'd not object ... but it
looks like a slippery slope from here.

And --- it's not like replacing "bpchar" with "text" actually buys us
any useful new functionality.  AFAICS it's just a cosmetic thing.
        regards, tom lane

PS: On the other hand, we might consider attacking the problem from
the reverse direction, ie *removing* code.  For example, if there
weren't redundant || operators for char and varchar, then every ||
operation would yield text, and the example we're looking at would
work the way you want for free.  I've thought for awhile that we
could use a pass through pg_proc and pg_operator to remove some
entries we don't really need.


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Bruce Momjian
Date:
> How many special cases like that do you want to put into the allegedly
> datatype-independent CREATE TABLE code?
> 
> If I thought this were the only case then I'd not object ... but it
> looks like a slippery slope from here.
> 
> And --- it's not like replacing "bpchar" with "text" actually buys us
> any useful new functionality.  AFAICS it's just a cosmetic thing.
> 
>             regards, tom lane
> 
> PS: On the other hand, we might consider attacking the problem from
> the reverse direction, ie *removing* code.  For example, if there
> weren't redundant || operators for char and varchar, then every ||
> operation would yield text, and the example we're looking at would
> work the way you want for free.  I've thought for awhile that we
> could use a pass through pg_proc and pg_operator to remove some
> entries we don't really need.

Can we convert bpchar to text in create table if no typmod is supplied?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Peter Eisentraut
Date:
Tom Lane writes:

> The argument here is about how much intelligence it's reasonable to
> expect the system to have.  It's very clearly not feasible to derive
> a length limit automagically in every case.  How hard should we try?

I would like to know what Proprietary database #1 does with

CREATE TABLE one ( a bit(6) );
INSERT INTO one VALUES ( b'101101' );
CREATE TABLE two ( b bit(4) );
INSERT INTO two VALUES ( b'0110' );
CREATE TABLE three AS SELECT a FROM one UNION SELECT b FROM two;

According to SQL92, clause 9.3, the result type of the union is bit(6).
However, it's not possible to store a bit(4) value into a bit(6) field.
Our current solution, "bit(<nothing>)" is even worse because it has no
real semantics at all (but you can store bit(<anything>) in it,
interestingly).

-- 
Peter Eisentraut   peter_e@gmx.net



Re: [BUGS] Bug #513: union all changes char(3) column definition

From
Bruce Momjian
Date:
Thread added to TODO.detail.

> Tom Lane writes:
> 
> > The argument here is about how much intelligence it's reasonable to
> > expect the system to have.  It's very clearly not feasible to derive
> > a length limit automagically in every case.  How hard should we try?
> 
> I would like to know what Proprietary database #1 does with
> 
> CREATE TABLE one ( a bit(6) );
> INSERT INTO one VALUES ( b'101101' );
> CREATE TABLE two ( b bit(4) );
> INSERT INTO two VALUES ( b'0110' );
> CREATE TABLE three AS SELECT a FROM one UNION SELECT b FROM two;
> 
> According to SQL92, clause 9.3, the result type of the union is bit(6).
> However, it's not possible to store a bit(4) value into a bit(6) field.
> Our current solution, "bit(<nothing>)" is even worse because it has no
> real semantics at all (but you can store bit(<anything>) in it,
> interestingly).
> 
> -- 
> Peter Eisentraut   peter_e@gmx.net
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026