Thread: UNION with more than 2 branches

UNION with more than 2 branches

From
Peter Eisentraut
Date:
The resolution to my problem with the select_common_type() error message 
turned out to be that this doesn't work:

postgres=# select null union select null union select 1;
ERROR:  UNION types text and integer cannot be matched

That's because it resolves the first two branches independently, then defaults 
to text if it can't find anything better, and then tries to match text to the 
integer in the third branch.

This should probably be fixed sometime.  Maybe make a note in the TODO list?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: UNION with more than 2 branches

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The resolution to my problem with the select_common_type() error message 
> turned out to be that this doesn't work:

> postgres=# select null union select null union select 1;
> ERROR:  UNION types text and integer cannot be matched

Yeah, this has been noted before.  The sticking point is that it's not
clear that resolving types across more than two branches at a time is
legal per SQL spec.  The spec defines UNION as exactly two at a time,
ie the above is really(select null union select null) union select 1;
and there is not any language that would justify allowing the "1" to
determine the data type of the inner UNION.  It would not be all that
important in a UNION ALL case, maybe, but for UNION the assigned data
type determines what values are considered duplicates, and thus can
have real impact on the results.

Maybe we should just ignore those qualms and do it anyway --- I must
admit that I'm hard-pressed to come up with a situation where anyone
would really want different datatypes used in the inner union than
the outer.
        regards, tom lane


Re: UNION with more than 2 branches

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Maybe we should just ignore those qualms and do it anyway --- I must
> admit that I'm hard-pressed to come up with a situation where anyone
> would really want different datatypes used in the inner union than
> the outer.

Does it even matter except in the case of nulls? I mean, if the inner pair
uses integer and then the outer pair uses bigint it'll still work correctly,
no?

What would happen if the inner pair defaulted null to "unknown" instead of
text? Then the next level would have a chance to union between unknown and
integer successfully.

It's a bit odd that that's basically what happens currently *except* for in
unions:

postgres=# create table foo as select null;
WARNING:  column "?column?" has type "unknown"
DETAIL:  Proceeding with relation creation anyway.
SELECT
postgres=# create table bar as select null union all select null;
SELECT
postgres=# \d foo       Table "public.foo" Column  |   Type    | Modifiers 
----------+-----------+-----------?column? | "unknown" | 

postgres=# \d bar    Table "public.bar" Column  | Type | Modifiers 
----------+------+-----------?column? | text | 



--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: UNION with more than 2 branches

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Maybe we should just ignore those qualms and do it anyway --- I must
>> admit that I'm hard-pressed to come up with a situation where anyone
>> would really want different datatypes used in the inner union than
>> the outer.

> Does it even matter except in the case of nulls? I mean, if the inner pair
> uses integer and then the outer pair uses bigint it'll still work correctly,
> no?

Oh, it absolutely matters: you can get different answers.  Consider
(select '1' union select ' 1') union all select 1;

Ignoring the point that we have no implicit integer/text cast, this
would yield three rows if the inner union is treated as text, vs
two rows if it's treated as integer.  Likewise, '1.0' is different from
'1' according to some datatypes and not others.

The urgency of this objection decreases greatly if we get rid of all the
implicit cross-type-category casts, I think.  Offhand the only trouble
case I can come up with without using a cross-category conversion is
trailing blanks in char vs text/varchar.

> What would happen if the inner pair defaulted null to "unknown" instead of
> text?

You're missing the point, which is that the inner UNION needs to decide
what its uniqueness semantics are, independently of what might happen to
its result later.  Or that's how I read the spec anyway.
        regards, tom lane


Re: UNION with more than 2 branches

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>
>> Does it even matter except in the case of nulls? I mean, if the inner pair
>> uses integer and then the outer pair uses bigint it'll still work correctly,
>> no?
>
> Oh, it absolutely matters: you can get different answers.  Consider
>
>     (select '1' union select ' 1') union all select 1;

Ah.

>> What would happen if the inner pair defaulted null to "unknown" instead of
>> text?
>
> You're missing the point, which is that the inner UNION needs to decide
> what its uniqueness semantics are, independently of what might happen to
> its result later.  Or that's how I read the spec anyway.

Ah of course. We wouldn't be able to unionize "unknown" all. hmph.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: UNION with more than 2 branches

From
Peter Eisentraut
Date:
Am Dienstag, 24. April 2007 19:48 schrieb Tom Lane:
> You're missing the point, which is that the inner UNION needs to decide
> what its uniqueness semantics are, independently of what might happen to
> its result later.  Or that's how I read the spec anyway.

It's probably safer to leave it as is.  Maybe there could be a way to make the 
error message more understandable ("Where did the 'text' come from?"), but 
perhaps the additional detail discussed in the other thread will do that.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/