Oops, I seem to have changed UNION's behavior - Mailing list pgsql-hackers

From Tom Lane
Subject Oops, I seem to have changed UNION's behavior
Date
Msg-id 5172.918413202@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Oops, I seem to have changed UNION's behavior  (Oleg Broytmann <phd@sun.med.ru>)
Re: [HACKERS] Oops, I seem to have changed UNION's behavior  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Oops, I seem to have changed UNION's behavior  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
The equal() updates I installed yesterday (to fix the "don't know
whether nodes of type 600 are equal" problem) have had an unintended
side effect.

Am I right in thinking that UNION (without ALL) is defined to do a
DISTINCT on its result, so that duplicates are removed even if the
duplicates both came from the same source table?  That's what 6.4.2
does, but I do not know if it's strictly kosher according to the SQL
spec.

If so, the code is now busted, because with the equal() extension in
place, cnfify() is able to recognize and remove duplicate select
clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
just "SELECT xxx" ... and that doesn't mean the same thing.

An actual example: given the data

play=> select a from tt;
a
-
1
1
2
3
(4 rows)

Under 6.4.2 I get:

play=> select a from tt union select a from tt;
a
-
1
2
3
(3 rows)

Note lack of duplicate "1".  Under current sources I get:

ttest=> select a from tt union select a from tt;
a
-
1
1
2
3
(4 rows)

since the query is effectively reduced to just "select a from tt".

Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
(1) simplify equal() to say that two T_Query nodes are never equal, or
(2) modify the planner so that the "select distinct" operation is
inserted explicitly, and will thus happen even if the UNIONed selects
are collapsed into just one.

(1) is a trivial fix of course, but it worries me --- maybe someday
we will need equal() to give an honest answer for Query nodes.
But I don't have the expertise to apply (2), and it seems like rather
a lot of work for a boundary case that isn't really interesting in
practice.

Comments?  *Is* 6.4.2 behaving according to the SQL spec?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Stupor Genius"
Date:
Subject: RE: [HACKERS] Problems with >2GB tables on Linux 2.0
Next
From: jwieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] trouble with rules