The equal() updates I installed yesterday (to fix the "don't know
whether nodes of type 600 are equal" problem) have had an unintended
side effect.
Am I right in thinking that UNION (without ALL) is defined to do a
DISTINCT on its result, so that duplicates are removed even if the
duplicates both came from the same source table? That's what 6.4.2
does, but I do not know if it's strictly kosher according to the SQL
spec.
If so, the code is now busted, because with the equal() extension in
place, cnfify() is able to recognize and remove duplicate select
clauses. That is, "SELECT xxx UNION SELECT xxx" will be folded to
just "SELECT xxx" ... and that doesn't mean the same thing.
An actual example: given the data
play=> select a from tt;
a
-
1
1
2
3
(4 rows)
Under 6.4.2 I get:
play=> select a from tt union select a from tt;
a
-
1
2
3
(3 rows)
Note lack of duplicate "1". Under current sources I get:
ttest=> select a from tt union select a from tt;
a
-
1
1
2
3
(4 rows)
since the query is effectively reduced to just "select a from tt".
Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
(1) simplify equal() to say that two T_Query nodes are never equal, or
(2) modify the planner so that the "select distinct" operation is
inserted explicitly, and will thus happen even if the UNIONed selects
are collapsed into just one.
(1) is a trivial fix of course, but it worries me --- maybe someday
we will need equal() to give an honest answer for Query nodes.
But I don't have the expertise to apply (2), and it seems like rather
a lot of work for a boundary case that isn't really interesting in
practice.
Comments? *Is* 6.4.2 behaving according to the SQL spec?
regards, tom lane