Home > mailing lists

Oops, I seem to have changed UNION's behavior - Mailing list pgsql-hackers

From	Tom Lane
Subject	Oops, I seem to have changed UNION's behavior
Date	February 7, 1999 13:47:24
Msg-id	5172.918413202@sss.pgh.pa.us Whole thread Raw
Responses	Re: [HACKERS] Oops, I seem to have changed UNION's behavior Re: [HACKERS] Oops, I seem to have changed UNION's behavior Re: [HACKERS] Oops, I seem to have changed UNION's behavior
List	pgsql-hackers

Tree view

The equal() updates I installed yesterday (to fix the "don't know
whether nodes of type 600 are equal" problem) have had an unintended
side effect.

Am I right in thinking that UNION (without ALL) is defined to do a
DISTINCT on its result, so that duplicates are removed even if the
duplicates both came from the same source table?  That's what 6.4.2
does, but I do not know if it's strictly kosher according to the SQL
spec.

If so, the code is now busted, because with the equal() extension in
place, cnfify() is able to recognize and remove duplicate select
clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
just "SELECT xxx" ... and that doesn't mean the same thing.

An actual example: given the data

play=> select a from tt;
a
-
1
1
2
3
(4 rows)

Under 6.4.2 I get:

play=> select a from tt union select a from tt;
a
-
1
2
3
(3 rows)

Note lack of duplicate "1".  Under current sources I get:

ttest=> select a from tt union select a from tt;
a
-
1
1
2
3
(4 rows)

since the query is effectively reduced to just "select a from tt".

Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
(1) simplify equal() to say that two T_Query nodes are never equal, or
(2) modify the planner so that the "select distinct" operation is
inserted explicitly, and will thus happen even if the UNIONed selects
are collapsed into just one.

(1) is a trivial fix of course, but it worries me --- maybe someday
we will need equal() to give an honest answer for Query nodes.
But I don't have the expertise to apply (2), and it seems like rather
a lot of work for a boundary case that isn't really interesting in
practice.

Comments?  *Is* 6.4.2 behaving according to the SQL spec?
        regards, tom lane

pgsql-hackers by date:

From: "Stupor Genius"
Date: 07 February 1999, 13:39:25
Subject: RE: [HACKERS] Problems with >2GB tables on Linux 2.0

From: jwieck@debis.com (Jan Wieck)
Date: 07 February 1999, 14:36:56
Subject: Re: [HACKERS] trouble with rules

Oops, I seem to have changed UNION's behavior - Mailing list pgsql-hackers

Previous

Next