Thread: Phantom row from aggregate in self-join in 6.5
Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a phantom row when doing the following: create table foo (a int); select t1.a, count(*) from foo t1, foo t2 group by t1.a; I get a|count -+----- | 0 (1 row) instead of zero rows. The row has an a column of "NULL". This happens even if I create table foo as "(a int not null)". I've checked that Informix 7.3LE gives zero rows as expected. Further, if I add having t1.a is not null to the select query to try to get rid of the bogus row then it gives ERROR: SELECT/HAVING requires aggregates to be valid but I don't know quite what that's telling me. Some of you might remember I had that other multi-aggregate/view problem recently which turned out to be fairly fundamentally unfixable due to the way postgres holds views internally in a close-to-SQL format rather than the underlying relational algebra. Can anyone tell me if this phantom row thing is another consequence of the implementation of aggregates in postgres or is just a buglet that can be fixed fairly easily? Thanks, --Malcolm -- Malcolm Beattie <mbeattie@sable.ox.ac.uk> Unix Systems Programmer Oxford University Computing Services
Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes: > Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a > phantom row when doing the following: > create table foo (a int); > select t1.a, count(*) from foo t1, foo t2 group by t1.a; > I get > a|count > -+----- > | 0 > (1 row) > instead of zero rows. It's not a bug, it's a feature ... or at least there are some around here who claim that the behavior is OK. I think they're wrong, but if you want it changed you'll need to cite chapter and verse from the SQL92 standard, not just assert that Informix does it differently. You'll find several past discussions of this point in the pgsql-hackers archives, and they all seem to have ended inconclusively. > is it just a buglet that can be fixed fairly easily? I think it would not be hard to fix, if we have a consensus that the behavior should change. regards, tom lane
At 20:52 22/07/99 -0400, you wrote: >Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes: >> Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a >> phantom row when doing the following: >> create table foo (a int); >> select t1.a, count(*) from foo t1, foo t2 group by t1.a; >> I get >> a|count >> -+----- >> | 0 >> (1 row) >> instead of zero rows. > >It's not a bug, it's a feature ... or at least there are some around >here who claim that the behavior is OK. I think they're wrong, but >if you want it changed you'll need to cite chapter and verse from the >SQL92 standard, not just assert that Informix does it differently. I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0 rows. Add this to Informix, and one begins to wonderif there are any that match the Postgres behaviour? Any idea where I can find a copy of the SQL92 standard on the net? >You'll find several past discussions of this point in the pgsql-hackers >archives, and they all seem to have ended inconclusively. I had a quick look at discussions involving informix, but could not find anything. Can you give a little more informationabout the past discussions, and specifically, what the reasons for preserving this behaviour were? >> is it just a buglet that can be fixed fairly easily? > >I think it would not be hard to fix, if we have a consensus that the >behavior should change. > ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 11:39 AM 7/23/99 +1000, Philip Warner wrote: >I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0 rows. Add this to Informix, and one begins to wonder if there are any that match the Postgres behaviour? >Any idea where I can find a copy of the SQL92 standard on the net? I'd like an answer to this, too :) It may be that you've stumbled into an area the standard's either left "implementation-dependent", "undefined", or simply forgotten or unthought-of. (can you tell I've been drafted into ANSI/ISO standards efforts in the past for Pascal and Modula-2?) Still, I must say that a row returning "0" in response to a count(*) isn't at all suprising, I guess it's a matter of whether or not the count(*) or the specific column being extracted determines the behavior. >>You'll find several past discussions of this point in the pgsql-hackers >>archives, and they all seem to have ended inconclusively. >I had a quick look at discussions involving informix, but could not find anything. Can you give a little more information about the past discussions, and specifically, what the reasons for preserving this behaviour were? First, I wouldn't trust Access to be much of an SQL standards judge. If nothing else, MS's collaboration with Sybase (SQL/Server) might perhaps color MS's view of what the standard sez. Not to mention the poaching of parser/semantic code, etc... And doesn't DEC Rdb have some genealogical relationship to SQL/Server? (I could be WAY off base here) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at http://donb.photo.net
At 18:57 22/07/99 -0700, you wrote: >At 11:39 AM 7/23/99 +1000, Philip Warner wrote: > >>I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0 >rows. Add this to Informix, and one begins to wonder if there are any that >match the Postgres behaviour? > >>Any idea where I can find a copy of the SQL92 standard on the net? > >I'd like an answer to this, too :) I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library. >It may be that you've stumbled into an area the standard's either >left "implementation-dependent", "undefined", or simply forgotten >or unthought-of. (can you tell I've been drafted into ANSI/ISO >standards efforts in the past for Pascal and Modula-2?) If that's the case, then the example below seems to produce an inconsistency: IMO, changing the columns selected should notchange the number of rows returned. >Still, I must say that a row returning "0" in response to a >count(*) isn't at all suprising, I guess it's a matter of >whether or not the count(*) or the specific column being >extracted determines the behavior. Count returning 0 is good, the problem is that: select t1.a from foo t1, foo t2 group by t1.a; ^ +--- No count(*) returns 0 rows (fine), but that select t1.a, count(*) from foo t1, foo t2 group by t1.a; returns 1 row, which is weird. > >First, I wouldn't trust Access to be much of an SQL standards judge. >If nothing else, MS's collaboration with Sybase (SQL/Server) might >perhaps color MS's view of what the standard sez. Not to mention >the poaching of parser/semantic code, etc... I agree, but it all adds a little weight to the argument - maybe? >And doesn't DEC Rdb have some genealogical relationship to SQL/Server? >(I could be WAY off base here) I don't think so. RDB was at version 3 in 1986 - that's when I started using it. It has had AFAICT a totally separate developmentstream from MS/Sybase etc, at least since that time, and almost certainly from its genesis. It was purchsed byOracle a year or two ago, but it still largely the same product. If anything, Oracle have improved it a little. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > > At 18:57 22/07/99 -0700, you wrote: > >At 11:39 AM 7/23/99 +1000, Philip Warner wrote: > > > >>I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0 > >rows. Add this to Informix, and one begins to wonder if there are any that > >match the Postgres behaviour? > > > >>Any idea where I can find a copy of the SQL92 standard on the net? > > > >I'd like an answer to this, too :) > > I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library. I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file. Any one intrested? Vadim
At 10:40 23/07/99 +0800, you wrote: > >I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file. >Any one intrested? > Very! E-mail it. Or put it on the PG site, or tell me how to FTP it... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > > At 10:40 23/07/99 +0800, you wrote: > > > >I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file. > >Any one intrested? > > > > Very! E-mail it. Or put it on the PG site, or tell me how to FTP it... I don't remember where I got it and what are copyrights, so I'll e-mail it to you, and anyone, in private mail. Vadim
Don Baccus <dhogaza@pacifier.com> writes: > Still, I must say that a row returning "0" in response to a > count(*) isn't at all suprising, I guess it's a matter of > whether or not the count(*) or the specific column being > extracted determines the behavior. Neither, it's GROUP BY that creates the issue. If you do an ungrouped query with aggregates, say SELECT count(*) FROM table WHERE someCondition; you will get one and only one row produced, with default values for the aggregates if there are no input rows (ie, either an empty table to start with, or nothing gets by the WHERE). Everybody seems to be happy with this. The question is what happens when GROUP BY enters the picture. There is a faction that thinks that if there are no input rows then you should still get one default row out. That makes no sense to me; it seems to me you should get one aggregated row per group if you have aggregates with GROUP BY, and if there are no input rows then there are no groups. But I have not burrowed into the SQL standard to try to develop a bulletproof argument for that position. >>> You'll find several past discussions of this point in the pgsql-hackers >>> archives, and they all seem to have ended inconclusively. >> I had a quick look at discussions involving informix, but could not find > anything. Informix is not the issue. Look for "GROUP BY" and aggregates. regards, tom lane
At 20:52 22/07/99 -0400, Tom Lane wrote: >Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes: >> Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a >> phantom row when doing the following: >> create table foo (a int); >> select t1.a, count(*) from foo t1, foo t2 group by t1.a; >> I get >> a|count >> -+----- >> | 0 >> (1 row) >> instead of zero rows. > >if you want it changed you'll need to cite chapter and verse from the >SQL92 standard, not just assert that Informix does it differently. Sadly, I only have access to a 1993 draft standard, but the following is from section 7.10: "The result of the <group by clause> is a partitioning of T into a set of groups. The set is the minimum number ofgroups such that, for each grouping column of each group of more than one row, no two values of that grouping columnare distinct." >From my reading of the standad, 'T' is the result of the select statement prior to being grouped. It would seem that ifT contains no rows, then "the minimum number of groups" would have to be zero. Other references, such as: 2) Let CR be the <column reference> with <column name> CN identi- fying the grouping column. Every rowof a given group contains equal values of CN. When a <search condition> or <value expres- sion> isapplied to a group, CR is a reference to the value of CN. (General Rules, Section 7.10) Would seem to indicate that any grouped result row must be supported by underlying rows on the ungrouped result set. Finally, using the above example: >> create table foo (a int); >> select t1.a, count(*) from foo t1, foo t2 group by t1.a; >> I get >> a|count >> -+----- >> | 0 >> (1 row) the values returned in the column 'a' NEVER appears in the source table. Is there anyone out there who believes this is NOTa problem? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> > create table foo (a int); > > select t1.a, count(*) from foo t1, foo t2 group by t1.a; > > I get > > a|count > > -+----- > > | 0 > > (1 row) > > instead of zero rows. > It's not a bug, it's a feature ... or at least there are some around > here who claim that the behavior is OK. I think they're wrong, but > if you want it changed you'll need to cite chapter and verse from the > SQL92 standard, not just assert that Informix does it differently. I don't recall which way I argued before (in fact, I don't recall this particular example), but I do remember arguing (with righteous conviction) that the query select count(*) from foo; should return a single row containing a zero value. Did we infer from that some behavior for "group by" (I can't recall any)? istm, at least today, that the behavior for the group-by is wrong, but we'd better not change the behavior of my example query... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Philip Warner wrote: > >>Any idea where I can find a copy of the SQL92 standard on the net? > >I'd like an answer to this, too :) > I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library. Go to http://www.contrib.andrew.cmu.edu/~shadow/sql.html for a good, if a little out of date, SQL page that lists several reference works. Lamar Owen WGCR Internet Radio
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > I don't recall which way I argued before (in fact, I don't recall this > particular example), but I do remember arguing (with righteous > conviction) that the query > select count(*) from foo; > should return a single row containing a zero value. No argument about that one. It's the GROUP BY case that's at issue. > Did we infer from > that some behavior for "group by" (I can't recall any)? istm, at least > today, that the behavior for the group-by is wrong, IIRC, you were the main advocate of the position that the code's existing behavior is correct. Does that mean I can go change it? ;-) regards, tom lane
> IIRC, you were the main advocate of the position that the code's > existing behavior is correct. Does that mean I can go change it? ;-) Yes, after you slap me around a bit for being so wrong. Do you remember when we were discussing it? I want to go back and see why I thought this was right. I'm guessing that the example was not phrased in exactly this way, and that there may be some other behavior we need to maintain. (Otherwise, I might have used up my "one wrong idea per year" ;) - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> IIRC, you were the main advocate of the position that the code's >> existing behavior is correct. Does that mean I can go change it? ;-) > Yes, after you slap me around a bit for being so wrong. Do you > remember when we were discussing it? I want to go back and see why I > thought this was right. I'm guessing that the example was not phrased > in exactly this way, and that there may be some other behavior we need > to maintain. (Otherwise, I might have used up my "one wrong idea per > year" ;) Actually, it may be my recollection that's wrong. The only discussion of the point that I can find right now is the thread "SUM() and GROUP BY" from around 1/12/99 in pghackers, and it seems to be mostly focused on arguments about whether you should get NULL or 0 from a no-input SUM... I would've sworn I remember a couple of other related threads in the past year or so, but I cannot find them now. Anyway, unless someone speaks up in favor of the way the code currently works, I will see about changing the results for the GROUP-BY-with-no- input-rows case. I got a few other things to do first though. regards, tom lane
Thus spake Tom Lane > IIRC, you were the main advocate of the position that the code's > existing behavior is correct. Does that mean I can go change it? ;-) I vote (again) for yes. It's so annoying having to add code to test each returned value against 0 just in case there is only one returned value that needs to be checked. "SELECT COUNT(*) FROM table" should always return one row but "SELECT COUNT(*) FROM table GROUP BY somethng" returns a variable number of rows anyway so what's the harm in extending "one or more" to "zero or more" returned rows? We have to iterate through whatever the count is anyway in our programs. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.