Thread: UNION with more than 2 branches
The resolution to my problem with the select_common_type() error message turned out to be that this doesn't work: postgres=# select null union select null union select 1; ERROR: UNION types text and integer cannot be matched That's because it resolves the first two branches independently, then defaults to text if it can't find anything better, and then tries to match text to the integer in the third branch. This should probably be fixed sometime. Maybe make a note in the TODO list? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > The resolution to my problem with the select_common_type() error message > turned out to be that this doesn't work: > postgres=# select null union select null union select 1; > ERROR: UNION types text and integer cannot be matched Yeah, this has been noted before. The sticking point is that it's not clear that resolving types across more than two branches at a time is legal per SQL spec. The spec defines UNION as exactly two at a time, ie the above is really(select null union select null) union select 1; and there is not any language that would justify allowing the "1" to determine the data type of the inner UNION. It would not be all that important in a UNION ALL case, maybe, but for UNION the assigned data type determines what values are considered duplicates, and thus can have real impact on the results. Maybe we should just ignore those qualms and do it anyway --- I must admit that I'm hard-pressed to come up with a situation where anyone would really want different datatypes used in the inner union than the outer. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Maybe we should just ignore those qualms and do it anyway --- I must > admit that I'm hard-pressed to come up with a situation where anyone > would really want different datatypes used in the inner union than > the outer. Does it even matter except in the case of nulls? I mean, if the inner pair uses integer and then the outer pair uses bigint it'll still work correctly, no? What would happen if the inner pair defaulted null to "unknown" instead of text? Then the next level would have a chance to union between unknown and integer successfully. It's a bit odd that that's basically what happens currently *except* for in unions: postgres=# create table foo as select null; WARNING: column "?column?" has type "unknown" DETAIL: Proceeding with relation creation anyway. SELECT postgres=# create table bar as select null union all select null; SELECT postgres=# \d foo Table "public.foo" Column | Type | Modifiers ----------+-----------+-----------?column? | "unknown" | postgres=# \d bar Table "public.bar" Column | Type | Modifiers ----------+------+-----------?column? | text | -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Maybe we should just ignore those qualms and do it anyway --- I must >> admit that I'm hard-pressed to come up with a situation where anyone >> would really want different datatypes used in the inner union than >> the outer. > Does it even matter except in the case of nulls? I mean, if the inner pair > uses integer and then the outer pair uses bigint it'll still work correctly, > no? Oh, it absolutely matters: you can get different answers. Consider (select '1' union select ' 1') union all select 1; Ignoring the point that we have no implicit integer/text cast, this would yield three rows if the inner union is treated as text, vs two rows if it's treated as integer. Likewise, '1.0' is different from '1' according to some datatypes and not others. The urgency of this objection decreases greatly if we get rid of all the implicit cross-type-category casts, I think. Offhand the only trouble case I can come up with without using a cross-category conversion is trailing blanks in char vs text/varchar. > What would happen if the inner pair defaulted null to "unknown" instead of > text? You're missing the point, which is that the inner UNION needs to decide what its uniqueness semantics are, independently of what might happen to its result later. Or that's how I read the spec anyway. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: > >> Does it even matter except in the case of nulls? I mean, if the inner pair >> uses integer and then the outer pair uses bigint it'll still work correctly, >> no? > > Oh, it absolutely matters: you can get different answers. Consider > > (select '1' union select ' 1') union all select 1; Ah. >> What would happen if the inner pair defaulted null to "unknown" instead of >> text? > > You're missing the point, which is that the inner UNION needs to decide > what its uniqueness semantics are, independently of what might happen to > its result later. Or that's how I read the spec anyway. Ah of course. We wouldn't be able to unionize "unknown" all. hmph. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Am Dienstag, 24. April 2007 19:48 schrieb Tom Lane: > You're missing the point, which is that the inner UNION needs to decide > what its uniqueness semantics are, independently of what might happen to > its result later. Or that's how I read the spec anyway. It's probably safer to leave it as is. Maybe there could be a way to make the error message more understandable ("Where did the 'text' come from?"), but perhaps the additional detail discussed in the other thread will do that. -- Peter Eisentraut http://developer.postgresql.org/~petere/