Thread: Unions and Grouping
I have a question about the SQL Specifications in regards to Unions...<br /><br />I recently put together a query that involvedunions similar to the following:<br /><br />SELECT<br /> 'Query 1' as id,<br /> my_value<br />FROM my_view<br />UNION<br />SELECT<br /> 'Query 2' as id,<br /> my_value<br />FROM my_other_view<br />;<br /><br />Thefirst query in the union gave me 39 records and the second gave me 34 records. I was expecting the union to give me39 + 34 = 73 records. <br /><br />When I ran this against DB2, I got 35 records (not sure about PostgreSQL - will haveto try it when I get home). What I found was when I did a group by my_value on each query I got two values that thenadded to 35. The reason was, my_value was duplicated in my_view and in my_other_view. What the Union appeared to bedoing was to gather the data and then do a group by on the complete results. I expected it to only eliminate duplicatesBETWEEN the two queries, not WITHIN the queries. <br /><br />My question, what do the SQL Specifications say shouldhappen on a Union? Is it supposed to eliminate duplicates even WITHIN the individual queries that are being unioned?<br/><br />Thanks!<br clear="all" /><br />-- <br /> ==================================================================<br/> Aaron Bono<br /> Aranya Software Technologies,Inc.<br /> <a href="http://www.aranya.com">http://www.aranya.com</a><br /> <a href="http://codeelixir.com">http://codeelixir.com</a><br />==================================================================
"Aaron Bono" <postgresql@aranya.com> writes: > My question, what do the SQL Specifications say should happen on a Union? > Is it supposed to eliminate duplicates even WITHIN the individual queries > that are being unioned? Yes. SQL92 7.10 saith: b) If a set operator is specified, then the result of applying the set operator is a table containingthe following rows: i) Let R be a row that is a duplicate of some row in T1 or of some row in T2 or both. Let m bethe number of duplicates of R in T1 and let n be the number of duplicates of R in T2, wherem >= 0 and n >= 0. ii) If ALL is not specified, then Case: 1) If UNION is specified, then Case: A) If m > 0 or n > 0, then T contains exactly one dupli- cate of R. B) Otherwise, T contains no duplicate of R. regards, tom lane
On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So if I don't want the duplicated WITHIN the two queries to be eliminated, I use UNION ALL?
Thanks!
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
"Aaron Bono" <postgresql@aranya.com> writes:
> My question, what do the SQL Specifications say should happen on a Union?
> Is it supposed to eliminate duplicates even WITHIN the individual queries
> that are being unioned?
Yes. SQL92 7.10 saith:
b) If a set operator is specified, then the result of applying
the set operator is a table containing the following rows:
i) Let R be a row that is a duplicate of some row in T1 or of
some row in T2 or both. Let m be the number of duplicates
of R in T1 and let n be the number of duplicates of R in
T2, where m >= 0 and n >= 0.
ii) If ALL is not specified, then
Case:
1) If UNION is specified, then
Case:
A) If m > 0 or n > 0, then T contains exactly one dupli-
cate of R.
B) Otherwise, T contains no duplicate of R.
regards, tom lane
So if I don't want the duplicated WITHIN the two queries to be eliminated, I use UNION ALL?
Thanks!
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On fös, 2006-12-15 at 16:17 -0600, Aaron Bono wrote: > On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Aaron Bono" <postgresql@aranya.com> writes: > > My question, what do the SQL Specifications say should > happen on a Union? > > Is it supposed to eliminate duplicates even WITHIN the > individual queries > > that are being unioned? > > Yes. SQL92 7.10 saith: > [snip SQL92 standardese] > > So if I don't want the duplicated WITHIN the two queries to be > eliminated, I use UNION ALL? with UNION ALL, no duplicates will be discarded, either "within" or "between" the queries. if I understand you, you are looking for having only queries that are both in Q1 and Q2 removed, but not for example, any duplicates in Q1 that are not in Q2. if this is what you want, then I think you may not have thought it through. what about a row that is twice in Q1 and three times in Q2? how many copies of this row should be in the result? gnari
On 12/15/06, Ragnar <gnari@hive.is> wrote:
What I want is two-fold:
1. I want to eliminate all duplicates which is exactly what UNION does - so I am using a straight UNION
2. I want to understand how UNION and UNION ALL work (which I do now - thanks)
I didn't really want the duplicates between the two queries eliminated but not within. That is just what I had expected to see the first time I ran the query. As you pointed out, this really isn't desirable so I am glad it doesn't work this way.
Thanks for all the input.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On fös, 2006-12-15 at 16:17 -0600, Aaron Bono wrote:
> On 12/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Aaron Bono" < postgresql@aranya.com> writes:
> > My question, what do the SQL Specifications say should
> happen on a Union?
> > Is it supposed to eliminate duplicates even WITHIN the
> individual queries
> > that are being unioned?
>
> Yes. SQL92 7.10 saith:
> [snip SQL92 standardese]
>
> So if I don't want the duplicated WITHIN the two queries to be
> eliminated, I use UNION ALL?
with UNION ALL, no duplicates will be discarded, either
"within" or "between" the queries.
if I understand you, you are looking for having only queries
that are both in Q1 and Q2 removed, but not for example,
any duplicates in Q1 that are not in Q2.
if this is what you want, then I think you may not have
thought it through. what about a row that is twice in
Q1 and three times in Q2? how many copies of this row
should be in the result?
gnari
What I want is two-fold:
1. I want to eliminate all duplicates which is exactly what UNION does - so I am using a straight UNION
2. I want to understand how UNION and UNION ALL work (which I do now - thanks)
I didn't really want the duplicates between the two queries eliminated but not within. That is just what I had expected to see the first time I ran the query. As you pointed out, this really isn't desirable so I am glad it doesn't work this way.
Thanks for all the input.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================