Thread: Unions and Grouping

Unions and Grouping

From
"Aaron Bono"
Date:
I have a question about the SQL Specifications in regards to Unions...<br /><br />I recently put together a query that
involvedunions similar to the following:<br /><br />SELECT<br />    'Query 1' as id,<br />    my_value<br />FROM
my_view<br />UNION<br />SELECT<br />    'Query 2' as id,<br />    my_value<br />FROM my_other_view<br />;<br /><br
/>Thefirst query in the union gave me 39 records and the second gave me 34 records.  I was expecting the union to give
me39 + 34 = 73 records. <br /><br />When I ran this against DB2, I got 35 records (not sure about PostgreSQL - will
haveto try it when I get home).  What I found was when I did a group by my_value on each query I got two values that
thenadded to 35.  The reason was, my_value was duplicated in my_view and in my_other_view.  What the Union appeared to
bedoing was to gather the data and then do a group by on the complete results.  I expected it to only eliminate
duplicatesBETWEEN the two queries, not WITHIN the queries. <br /><br />My question, what do the SQL Specifications say
shouldhappen on a Union?  Is it supposed to eliminate duplicates even WITHIN the individual queries that are being
unioned?<br/><br />Thanks!<br clear="all" /><br />-- <br />
==================================================================<br/>   Aaron Bono<br />   Aranya Software
Technologies,Inc.<br />   <a href="http://www.aranya.com">http://www.aranya.com</a><br />   <a
href="http://codeelixir.com">http://codeelixir.com</a><br
/>================================================================== 

Re: Unions and Grouping

From
Tom Lane
Date:
"Aaron Bono" <postgresql@aranya.com> writes:
> My question, what do the SQL Specifications say should happen on a Union?
> Is it supposed to eliminate duplicates even WITHIN the individual queries
> that are being unioned?

Yes.  SQL92 7.10 saith:
           b) If a set operator is specified, then the result of applying             the set operator is a table
containingthe following rows:
 
             i) Let R be a row that is a duplicate of some row in T1 or of                some row in T2 or both. Let m
bethe number of duplicates                of R in T1 and let n be the number of duplicates of R in                T2,
wherem >= 0 and n >= 0.
 
            ii) If ALL is not specified, then
                Case:
                1) If UNION is specified, then
                  Case:
                  A) If m > 0 or n > 0, then T contains exactly one dupli-                     cate of R.
                  B) Otherwise, T contains no duplicate of R.

        regards, tom lane


Re: Unions and Grouping

From
"Aaron Bono"
Date:
On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Aaron Bono" <postgresql@aranya.com> writes:
> My question, what do the SQL Specifications say should happen on a Union?
> Is it supposed to eliminate duplicates even WITHIN the individual queries
> that are being unioned?

Yes.  SQL92 7.10 saith:

            b) If a set operator is specified, then the result of applying
              the set operator is a table containing the following rows:

              i) Let R be a row that is a duplicate of some row in T1 or of
                 some row in T2 or both. Let m be the number of duplicates
                 of R in T1 and let n be the number of duplicates of R in
                 T2, where m >= 0 and n >= 0.

             ii) If ALL is not specified, then

                 Case:

                 1) If UNION is specified, then

                   Case:

                   A) If m > 0 or n > 0, then T contains exactly one dupli-
                      cate of R.

                   B) Otherwise, T contains no duplicate of R.


                        regards, tom lane

So if I don't want the duplicated WITHIN the two queries to be eliminated, I use UNION ALL?

Thanks!


--
==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Unions and Grouping

From
Ragnar
Date:
On fös, 2006-12-15 at 16:17 -0600, Aaron Bono wrote:
> On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>         "Aaron Bono" <postgresql@aranya.com> writes:
>         > My question, what do the SQL Specifications say should
>         happen on a Union?
>         > Is it supposed to eliminate duplicates even WITHIN the
>         individual queries 
>         > that are being unioned?
>         
>         Yes.  SQL92 7.10 saith:
> [snip SQL92 standardese]

> 
> So if I don't want the duplicated WITHIN the two queries to be
> eliminated, I use UNION ALL?

with UNION ALL, no duplicates will be discarded, either
"within" or "between" the queries.

if I understand you, you are looking for having only queries
that are both in Q1 and Q2 removed, but not for example, 
any duplicates in Q1 that are not in Q2.

if this is what you want, then I think you may not have 
thought it through. what about a row that is twice in
Q1 and three times in Q2? how many copies of this row
should be in the result?

gnari





Re: Unions and Grouping

From
"Aaron Bono"
Date:
On 12/15/06, Ragnar <gnari@hive.is> wrote:
On fös, 2006-12-15 at 16:17 -0600, Aaron Bono wrote:
> On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>         "Aaron Bono" < postgresql@aranya.com> writes:
>         > My question, what do the SQL Specifications say should
>         happen on a Union?
>         > Is it supposed to eliminate duplicates even WITHIN the
>         individual queries
>         > that are being unioned?
>
>         Yes.  SQL92 7.10 saith:
> [snip SQL92 standardese]

>
> So if I don't want the duplicated WITHIN the two queries to be
> eliminated, I use UNION ALL?

with UNION ALL, no duplicates will be discarded, either
"within" or "between" the queries.

if I understand you, you are looking for having only queries
that are both in Q1 and Q2 removed, but not for example,
any duplicates in Q1 that are not in Q2.

if this is what you want, then I think you may not have
thought it through. what about a row that is twice in
Q1 and three times in Q2? how many copies of this row
should be in the result?

gnari


What I want is two-fold:

1. I want to eliminate all duplicates which is exactly what UNION does - so I am using a straight UNION
2. I want to understand how UNION and UNION ALL work (which I do now - thanks)

I didn't really want the duplicates between the two queries eliminated but not within.  That is just what I had expected to see the first time I ran the query.  As you pointed out, this really isn't desirable so I am glad it doesn't work this way.

Thanks for all the input.

--
==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================