Thread: Oops, I seem to have changed UNION's behavior

Oops, I seem to have changed UNION's behavior

From
Tom Lane
Date:
The equal() updates I installed yesterday (to fix the "don't know
whether nodes of type 600 are equal" problem) have had an unintended
side effect.

Am I right in thinking that UNION (without ALL) is defined to do a
DISTINCT on its result, so that duplicates are removed even if the
duplicates both came from the same source table?  That's what 6.4.2
does, but I do not know if it's strictly kosher according to the SQL
spec.

If so, the code is now busted, because with the equal() extension in
place, cnfify() is able to recognize and remove duplicate select
clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
just "SELECT xxx" ... and that doesn't mean the same thing.

An actual example: given the data

play=> select a from tt;
a
-
1
1
2
3
(4 rows)

Under 6.4.2 I get:

play=> select a from tt union select a from tt;
a
-
1
2
3
(3 rows)

Note lack of duplicate "1".  Under current sources I get:

ttest=> select a from tt union select a from tt;
a
-
1
1
2
3
(4 rows)

since the query is effectively reduced to just "select a from tt".

Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
(1) simplify equal() to say that two T_Query nodes are never equal, or
(2) modify the planner so that the "select distinct" operation is
inserted explicitly, and will thus happen even if the UNIONed selects
are collapsed into just one.

(1) is a trivial fix of course, but it worries me --- maybe someday
we will need equal() to give an honest answer for Query nodes.
But I don't have the expertise to apply (2), and it seems like rather
a lot of work for a boundary case that isn't really interesting in
practice.

Comments?  *Is* 6.4.2 behaving according to the SQL spec?
        regards, tom lane


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Oleg Broytmann
Date:
Hello!

On Sun, 7 Feb 1999, Tom Lane wrote:
> The equal() updates I installed yesterday (to fix the "don't know
> whether nodes of type 600 are equal" problem) have had an unintended
> side effect.
> 
> Am I right in thinking that UNION (without ALL) is defined to do a
> DISTINCT on its result, so that duplicates are removed even if the
> duplicates both came from the same source table?  That's what 6.4.2
> does, but I do not know if it's strictly kosher according to the SQL
> spec.
  Yes, this is standard. My books (primary, Gruber) say UNION should work
this way - UNION without ALL implies DISTINCT.

> If so, the code is now busted, because with the equal() extension in
> place, cnfify() is able to recognize and remove duplicate select
> clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
> just "SELECT xxx" ... and that doesn't mean the same thing.
> 
> An actual example: given the data
> 
> play=> select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> Under 6.4.2 I get:
> 
> play=> select a from tt union select a from tt;
> a
> -
> 1
> 2
> 3
> (3 rows)
> 
> Note lack of duplicate "1".  Under current sources I get:
> 
> ttest=> select a from tt union select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> since the query is effectively reduced to just "select a from tt".
  I am sure my books did not consider such case as UNION that could be
otimized this way. Not sure what is Right Thing here...

Oleg.
----    Oleg Broytmann  National Research Surgery Centre  http://sun.med.ru/~phd/          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
"Thomas G. Lockhart"
Date:
> The equal() updates I installed yesterday (to fix the "don't know
> whether nodes of type 600 are equal" problem) have had an unintended
> side effect.
> Am I right in thinking that UNION (without ALL) is defined to do a
> DISTINCT on its result, so that duplicates are removed even if the
> duplicates both came from the same source table?  That's what 6.4.2
> does, but I do not know if it's strictly kosher according to the SQL
> spec.

Yup, that's the way it should be...

> If so, the code is now busted, because with the equal() extension in
> place, cnfify() is able to recognize and remove duplicate select
> clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
> just "SELECT xxx" ... and that doesn't mean the same thing.

:(

> Assuming that 6.4.2 is doing the Right Thing, I see two possible 
> fixes:
> (1) simplify equal() to say that two T_Query nodes are never equal, or
> (2) modify the planner so that the "select distinct" operation is
> inserted explicitly, and will thus happen even if the UNIONed selects
> are collapsed into just one.

Sounds to me like (2) is the correct way to do it, as long as the
explicit "sort/unique" happens only if the query is collapsed. I haven't
looked at this code, and have no experience with cnfify(), but at the
time it decides to collapse into a single select, can't it ensure that a
sort/unique is done instead? Or is that what you are suggesting?
Wouldn't want to have two sort/unique operations on top of each other...
                    - Tom


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Bruce Momjian
Date:
Was there a conclusion on this?


> The equal() updates I installed yesterday (to fix the "don't know
> whether nodes of type 600 are equal" problem) have had an unintended
> side effect.
> 
> Am I right in thinking that UNION (without ALL) is defined to do a
> DISTINCT on its result, so that duplicates are removed even if the
> duplicates both came from the same source table?  That's what 6.4.2
> does, but I do not know if it's strictly kosher according to the SQL
> spec.
> 
> If so, the code is now busted, because with the equal() extension in
> place, cnfify() is able to recognize and remove duplicate select
> clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
> just "SELECT xxx" ... and that doesn't mean the same thing.
> 
> An actual example: given the data
> 
> play=> select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> Under 6.4.2 I get:
> 
> play=> select a from tt union select a from tt;
> a
> -
> 1
> 2
> 3
> (3 rows)
> 
> Note lack of duplicate "1".  Under current sources I get:
> 
> ttest=> select a from tt union select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> since the query is effectively reduced to just "select a from tt".
> 
> Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
> (1) simplify equal() to say that two T_Query nodes are never equal, or
> (2) modify the planner so that the "select distinct" operation is
> inserted explicitly, and will thus happen even if the UNIONed selects
> are collapsed into just one.
> 
> (1) is a trivial fix of course, but it worries me --- maybe someday
> we will need equal() to give an honest answer for Query nodes.
> But I don't have the expertise to apply (2), and it seems like rather
> a lot of work for a boundary case that isn't really interesting in
> practice.
> 
> Comments?  *Is* 6.4.2 behaving according to the SQL spec?
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Thomas Lockhart
Date:
> > Am I right in thinking that UNION (without ALL) is defined to do a
> > DISTINCT on its result, so that duplicates are removed even if the
> > duplicates both came from the same source table?  That's what 6.4.2
> > does, but I do not know if it's strictly kosher according to the SQL
> > spec.

(Just in case this is still active)

Yes, this is the right behavior according to SQL92...
                        - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>>>> Am I right in thinking that UNION (without ALL) is defined to do a
>>>> DISTINCT on its result, so that duplicates are removed even if the
>>>> duplicates both came from the same source table?  That's what 6.4.2
>>>> does, but I do not know if it's strictly kosher according to the SQL
>>>> spec.

> (Just in case this is still active)

> Yes, this is the right behavior according to SQL92...

OK, then 6.5 is still broken :-(.  I know a lot more about the planner
than I did then, so I will see if I can fix it "right" --- that is,
without taking out equal()'s ability to detect equality of Query nodes.

If that seems too hard/risky, I will just lobotomize equal() instead.

Thanks for the reminder, Bruce --- I had forgotten about this issue.
        regards, tom lane


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Taral
Date:
On Sun, 9 May 1999, Thomas Lockhart wrote:

> > > Am I right in thinking that UNION (without ALL) is defined to do a
> > > DISTINCT on its result, so that duplicates are removed even if the
> > > duplicates both came from the same source table?  That's what 6.4.2
> > > does, but I do not know if it's strictly kosher according to the SQL
> > > spec.
> 
> Yes, this is the right behavior according to SQL92...

In which case something should put a DISTINCT on queries using UNION...
since making T_Query nodes never equal is a deoptimization.

Taral



Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Bruce Momjian
Date:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> >>>> Am I right in thinking that UNION (without ALL) is defined to do a
> >>>> DISTINCT on its result, so that duplicates are removed even if the
> >>>> duplicates both came from the same source table?  That's what 6.4.2
> >>>> does, but I do not know if it's strictly kosher according to the SQL
> >>>> spec.
> 
> > (Just in case this is still active)
> 
> > Yes, this is the right behavior according to SQL92...
> 
> OK, then 6.5 is still broken :-(.  I know a lot more about the planner
> than I did then, so I will see if I can fix it "right" --- that is,
> without taking out equal()'s ability to detect equality of Query nodes.
> 
> If that seems too hard/risky, I will just lobotomize equal() instead.
> 
> Thanks for the reminder, Bruce --- I had forgotten about this issue.

Hey, that's why I keep 500 messages in my PostgreSQL mailbox.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Bruce Momjian
Date:
Can someone comment on this?  Is it still an issue with cnf'ify removing
duplicate cases?



> The equal() updates I installed yesterday (to fix the "don't know
> whether nodes of type 600 are equal" problem) have had an unintended
> side effect.
> 
> Am I right in thinking that UNION (without ALL) is defined to do a
> DISTINCT on its result, so that duplicates are removed even if the
> duplicates both came from the same source table?  That's what 6.4.2
> does, but I do not know if it's strictly kosher according to the SQL
> spec.
> 
> If so, the code is now busted, because with the equal() extension in
> place, cnfify() is able to recognize and remove duplicate select
> clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
> just "SELECT xxx" ... and that doesn't mean the same thing.
> 
> An actual example: given the data
> 
> play=> select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> Under 6.4.2 I get:
> 
> play=> select a from tt union select a from tt;
> a
> -
> 1
> 2
> 3
> (3 rows)
> 
> Note lack of duplicate "1".  Under current sources I get:
> 
> ttest=> select a from tt union select a from tt;
> a
> -
> 1
> 1
> 2
> 3
> (4 rows)
> 
> since the query is effectively reduced to just "select a from tt".
> 
> Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
> (1) simplify equal() to say that two T_Query nodes are never equal, or
> (2) modify the planner so that the "select distinct" operation is
> inserted explicitly, and will thus happen even if the UNIONed selects
> are collapsed into just one.
> 
> (1) is a trivial fix of course, but it worries me --- maybe someday
> we will need equal() to give an honest answer for Query nodes.
> But I don't have the expertise to apply (2), and it seems like rather
> a lot of work for a boundary case that isn't really interesting in
> practice.
> 
> Comments?  *Is* 6.4.2 behaving according to the SQL spec?
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Can someone comment on this?  Is it still an issue with cnf'ify removing
> duplicate cases?

This is still an open bug.  I looked at it and concluded that a clean
solution probably requires modifying the parsetree structure for UNIONs.
(Currently, UNION, INTERSECT, EXCEPT are converted to OR, AND, AND NOT
structures with queries as arguments ... which would be OK, except that
the semantics aren't really the same, and the difference between UNION
and UNION ALL doesn't fit into that operator set at all.)

That looked like a lot of work for what is surely a low-priority bug,
so I haven't gotten around to it yet.  Need more round tuits.

In the meantime, it should be on the TODO list: * SELECT foo UNION SELECT foo is incorrectly simplified to SELECT foo
        regards, tom lane

>> The equal() updates I installed yesterday (to fix the "don't know
>> whether nodes of type 600 are equal" problem) have had an unintended
>> side effect.
>> 
>> Am I right in thinking that UNION (without ALL) is defined to do a
>> DISTINCT on its result, so that duplicates are removed even if the
>> duplicates both came from the same source table?  That's what 6.4.2
>> does, but I do not know if it's strictly kosher according to the SQL
>> spec.
>> 
>> If so, the code is now busted, because with the equal() extension in
>> place, cnfify() is able to recognize and remove duplicate select
>> clauses.  That is, "SELECT xxx UNION SELECT xxx" will be folded to
>> just "SELECT xxx" ... and that doesn't mean the same thing.
>> 
>> An actual example: given the data
>> 
>> play=> select a from tt;
>> a
>> -
>> 1
>> 1
>> 2
>> 3
>> (4 rows)
>> 
>> Under 6.4.2 I get:
>> 
>> play=> select a from tt union select a from tt;
>> a
>> -
>> 1
>> 2
>> 3
>> (3 rows)
>> 
>> Note lack of duplicate "1".  Under current sources I get:
>> 
>> ttest=> select a from tt union select a from tt;
>> a
>> -
>> 1
>> 1
>> 2
>> 3
>> (4 rows)
>> 
>> since the query is effectively reduced to just "select a from tt".
>> 
>> Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
>> (1) simplify equal() to say that two T_Query nodes are never equal, or
>> (2) modify the planner so that the "select distinct" operation is
>> inserted explicitly, and will thus happen even if the UNIONed selects
>> are collapsed into just one.
>> 
>> (1) is a trivial fix of course, but it worries me --- maybe someday
>> we will need equal() to give an honest answer for Query nodes.
>> But I don't have the expertise to apply (2), and it seems like rather
>> a lot of work for a boundary case that isn't really interesting in
>> practice.
>> 
>> Comments?  *Is* 6.4.2 behaving according to the SQL spec?
>> 
>> regards, tom lane
>> 
>> 


> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026