Thread: Phantom row from aggregate in self-join in 6.5

Phantom row from aggregate in self-join in 6.5

From

Malcolm Beattie

Date:

22 July 1999, 09:38:28

Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a
phantom row when doing the following:
   create table foo (a int);   select t1.a, count(*) from foo t1, foo t2 group by t1.a;

I get
   a|count   -+-----    |    0   (1 row)

instead of zero rows. The row has an a column of "NULL". This happens
even if I create table foo as "(a int not null)".

I've checked that Informix 7.3LE gives zero rows as expected.

Further, if I add   having t1.a is not null
to the select query to try to get rid of the bogus row then it gives   ERROR:  SELECT/HAVING requires aggregates to be
valid
but I don't know quite what that's telling me.

Some of you might remember I had that other multi-aggregate/view
problem recently which turned out to be fairly fundamentally unfixable
due to the way postgres holds views internally in a close-to-SQL
format rather than the underlying relational algebra. Can anyone tell
me if this phantom row thing is another consequence of the
implementation of aggregates in postgres or is just a buglet that can
be fixed fairly easily?

Thanks,
--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Tom Lane

Date:

22 July 1999, 20:54:35

Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
> Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a
> phantom row when doing the following:
>     create table foo (a int);
>     select t1.a, count(*) from foo t1, foo t2 group by t1.a;
> I get
>     a|count
>     -+-----
>      |    0
>     (1 row)
> instead of zero rows.

It's not a bug, it's a feature ... or at least there are some around
here who claim that the behavior is OK.  I think they're wrong, but
if you want it changed you'll need to cite chapter and verse from the
SQL92 standard, not just assert that Informix does it differently.
You'll find several past discussions of this point in the pgsql-hackers
archives, and they all seem to have ended inconclusively.

> is it just a buglet that can be fixed fairly easily?

I think it would not be hard to fix, if we have a consensus that the
behavior should change.
        regards, tom lane

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Philip Warner

Date:

22 July 1999, 21:39:36

At 20:52 22/07/99 -0400, you wrote:
>Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
>> Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a
>> phantom row when doing the following:
>>     create table foo (a int);
>>     select t1.a, count(*) from foo t1, foo t2 group by t1.a;
>> I get
>>     a|count
>>     -+-----
>>      |    0
>>     (1 row)
>> instead of zero rows.
>
>It's not a bug, it's a feature ... or at least there are some around
>here who claim that the behavior is OK.  I think they're wrong, but
>if you want it changed you'll need to cite chapter and verse from the
>SQL92 standard, not just assert that Informix does it differently.

I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0 rows. Add this to Informix, and one begins to
wonderif there are any that match the Postgres behaviour?
 

Any idea where I can find a copy of the SQL92 standard on the net?


>You'll find several past discussions of this point in the pgsql-hackers
>archives, and they all seem to have ended inconclusively.

I had a quick look at discussions involving informix, but could not find anything. Can you give a little more
informationabout the past discussions, and specifically, what the reasons for preserving this behaviour were?
 


>> is it just a buglet that can be fixed fairly easily?
>
>I think it would not be hard to fix, if we have a consensus that the
>behavior should change.
>

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Don Baccus

Date:

22 July 1999, 22:02:38

At 11:39 AM 7/23/99 +1000, Philip Warner wrote:

>I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0
rows. Add this to Informix, and one begins to wonder if there are any that
match the Postgres behaviour?

>Any idea where I can find a copy of the SQL92 standard on the net?

I'd like an answer to this, too :)

It may be that you've stumbled into an area the standard's either
left "implementation-dependent", "undefined", or simply forgotten
or unthought-of.  (can you tell I've been drafted into ANSI/ISO
standards efforts in the past for Pascal and Modula-2?)

Still, I must say that a row returning "0" in response to a 
count(*) isn't at all suprising, I guess it's a matter of 
whether or not the count(*) or the specific column being
extracted determines the behavior.

>>You'll find several past discussions of this point in the pgsql-hackers
>>archives, and they all seem to have ended inconclusively.

>I had a quick look at discussions involving informix, but could not find
anything. Can you give a little more information about the past
discussions, and specifically, what the reasons for preserving this
behaviour were?

First, I wouldn't trust Access to be much of an SQL standards judge.
If nothing else, MS's collaboration with Sybase (SQL/Server) might
perhaps color MS's view of what the standard sez. Not to mention 
the poaching of parser/semantic code, etc...

And doesn't DEC Rdb have some genealogical relationship to SQL/Server?
(I could be WAY off base here)

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at
http://donb.photo.net

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Philip Warner

Date:

22 July 1999, 22:23:37

At 18:57 22/07/99 -0700, you wrote:
>At 11:39 AM 7/23/99 +1000, Philip Warner wrote:
>
>>I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0
>rows. Add this to Informix, and one begins to wonder if there are any that
>match the Postgres behaviour?
>
>>Any idea where I can find a copy of the SQL92 standard on the net?
>
>I'd like an answer to this, too :)

I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library.

>It may be that you've stumbled into an area the standard's either
>left "implementation-dependent", "undefined", or simply forgotten
>or unthought-of.  (can you tell I've been drafted into ANSI/ISO
>standards efforts in the past for Pascal and Modula-2?)

If that's the case, then the example below seems to produce an inconsistency: IMO, changing the columns selected should
notchange the number of rows returned.
 

>Still, I must say that a row returning "0" in response to a 
>count(*) isn't at all suprising, I guess it's a matter of 
>whether or not the count(*) or the specific column being
>extracted determines the behavior.

Count returning 0 is good, the problem is that:
   select t1.a from foo t1, foo t2 group by t1.a;              ^              +--- No count(*)

returns 0 rows (fine), but that 
   select t1.a, count(*) from foo t1, foo t2 group by t1.a;

returns 1 row, which is weird.


>
>First, I wouldn't trust Access to be much of an SQL standards judge.
>If nothing else, MS's collaboration with Sybase (SQL/Server) might
>perhaps color MS's view of what the standard sez. Not to mention 
>the poaching of parser/semantic code, etc...

I agree, but it all adds a little weight to the argument - maybe?


>And doesn't DEC Rdb have some genealogical relationship to SQL/Server?
>(I could be WAY off base here)

I don't think so. RDB was at version 3 in 1986 - that's when I started using it. It has had AFAICT a totally separate
developmentstream from MS/Sybase etc, at least since that time, and almost certainly from its genesis. It was purchsed
byOracle a year or two ago, but it still largely the same product. If anything, Oracle have improved it a little.
 



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Vadim Mikheev

Date:

22 July 1999, 22:41:37

Philip Warner wrote:
> 
> At 18:57 22/07/99 -0700, you wrote:
> >At 11:39 AM 7/23/99 +1000, Philip Warner wrote:
> >
> >>I've now checked Dec Rdb, SQL/Server, and MS-Access - and they return 0
> >rows. Add this to Informix, and one begins to wonder if there are any that
> >match the Postgres behaviour?
> >
> >>Any idea where I can find a copy of the SQL92 standard on the net?
> >
> >I'd like an answer to this, too :)
> 
> I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library.

I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file.
Any one intrested?

Vadim

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Philip Warner

Date:

22 July 1999, 22:44:37

At 10:40 23/07/99 +0800, you wrote:
>
>I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file.
>Any one intrested?
>

Very! E-mail it. Or put it on the PG site, or tell me how to FTP it...

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Vadim Mikheev

Date:

22 July 1999, 22:58:37

Philip Warner wrote:
> 
> At 10:40 23/07/99 +0800, you wrote:
> >
> >I have "ISO and ANSI SQL3 Working Draft-August 12, 1993", 3M file.
> >Any one intrested?
> >
> 
> Very! E-mail it. Or put it on the PG site, or tell me how to FTP it...

I don't remember where I got it and what are copyrights, so
I'll e-mail it to you, and anyone, in private mail.

Vadim

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Tom Lane

Date:

22 July 1999, 23:29:38

Don Baccus <dhogaza@pacifier.com> writes:
> Still, I must say that a row returning "0" in response to a 
> count(*) isn't at all suprising, I guess it's a matter of 
> whether or not the count(*) or the specific column being
> extracted determines the behavior.

Neither, it's GROUP BY that creates the issue.

If you do an ungrouped query with aggregates, say
SELECT count(*) FROM table WHERE someCondition;

you will get one and only one row produced, with default values for
the aggregates if there are no input rows (ie, either an empty table
to start with, or nothing gets by the WHERE).  Everybody seems
to be happy with this.

The question is what happens when GROUP BY enters the picture.
There is a faction that thinks that if there are no input rows
then you should still get one default row out.  That makes no
sense to me; it seems to me you should get one aggregated row per
group if you have aggregates with GROUP BY, and if there are
no input rows then there are no groups.  But I have not burrowed
into the SQL standard to try to develop a bulletproof argument
for that position.

>>> You'll find several past discussions of this point in the pgsql-hackers
>>> archives, and they all seem to have ended inconclusively.

>> I had a quick look at discussions involving informix, but could not find
> anything.

Informix is not the issue.  Look for "GROUP BY" and aggregates.
        regards, tom lane

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Philip Warner

Date:

22 July 1999, 23:30:38

At 20:52 22/07/99 -0400, Tom Lane wrote:
>Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
>> Using 6.5 (via Thomas Lockhart's Linux RPM build of Jul 2), I get a
>> phantom row when doing the following:
>>     create table foo (a int);
>>     select t1.a, count(*) from foo t1, foo t2 group by t1.a;
>> I get
>>     a|count
>>     -+-----
>>      |    0
>>     (1 row)
>> instead of zero rows.
>

>if you want it changed you'll need to cite chapter and verse from the
>SQL92 standard, not just assert that Informix does it differently.

Sadly, I only have access to a 1993 draft standard, but the following is from section 7.10:
   "The result of the <group by clause> is a partitioning of T into    a set of groups. The set is the minimum number
ofgroups such    that, for each grouping column of each group of more than one    row, no two values of that grouping
columnare distinct."
 

>From my reading of the standad, 'T' is the result of the select statement prior to being grouped. It would seem that
ifT contains no rows, then "the minimum number of groups" would have to be zero.
 

Other references, such as:
        2) Let CR be the <column reference> with <column name> CN identi-           fying the grouping column. Every
rowof a given group contains           equal values of CN. When a <search condition> or <value expres-           sion>
isapplied to a group, CR is a reference to the value of           CN.
 
        (General Rules, Section 7.10)

Would seem to indicate that any grouped result row must be supported by underlying rows on the ungrouped result set.

Finally, using the above example:

>>     create table foo (a int);
>>     select t1.a, count(*) from foo t1, foo t2 group by t1.a;
>> I get
>>     a|count
>>     -+-----
>>      |    0
>>     (1 row)

the values returned in the column 'a' NEVER appears in the source table. Is there anyone out there who believes this is
NOTa problem?
 





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Thomas Lockhart

Date:

22 July 1999, 23:33:38

> >     create table foo (a int);
> >     select t1.a, count(*) from foo t1, foo t2 group by t1.a;
> > I get
> >     a|count
> >     -+-----
> >      |    0
> >     (1 row)
> > instead of zero rows.
> It's not a bug, it's a feature ... or at least there are some around
> here who claim that the behavior is OK.  I think they're wrong, but
> if you want it changed you'll need to cite chapter and verse from the
> SQL92 standard, not just assert that Informix does it differently.

I don't recall which way I argued before (in fact, I don't recall this
particular example), but I do remember arguing (with righteous
conviction) that the query
 select count(*) from foo;

should return a single row containing a zero value. Did we infer from
that some behavior for "group by" (I can't recall any)? istm, at least
today, that the behavior for the group-by is wrong, but we'd better
not change the behavior of my example query...
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Lamar Owen

Date:

23 July 1999, 10:16:45

Philip Warner wrote:

> >>Any idea where I can find a copy of the SQL92 standard on the net?

> >I'd like an answer to this, too :)

> I have found a US based organization that sell 38MB file for $220...I guess I'll go to a library.

Go to http://www.contrib.andrew.cmu.edu/~shadow/sql.html for a good, if
a little out of date, SQL page that lists several reference works.

Lamar Owen
WGCR Internet Radio

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Tom Lane

Date:

23 July 1999, 10:50:45

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> I don't recall which way I argued before (in fact, I don't recall this
> particular example), but I do remember arguing (with righteous
> conviction) that the query
>   select count(*) from foo;
> should return a single row containing a zero value.

No argument about that one.  It's the GROUP BY case that's at issue.

> Did we infer from
> that some behavior for "group by" (I can't recall any)? istm, at least
> today, that the behavior for the group-by is wrong,

IIRC, you were the main advocate of the position that the code's
existing behavior is correct.  Does that mean I can go change it? ;-)
        regards, tom lane

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Thomas Lockhart

Date:

23 July 1999, 12:30:46

> IIRC, you were the main advocate of the position that the code's
> existing behavior is correct.  Does that mean I can go change it? ;-)

Yes, after you slap me around a bit for being so wrong. Do you
remember when we were discussing it? I want to go back and see why I
thought this was right. I'm guessing that the example was not phrased
in exactly this way, and that there may be some other behavior we need
to maintain. (Otherwise, I might have used up my "one wrong idea per
year" ;)
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

Tom Lane

Date:

23 July 1999, 18:46:50

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> IIRC, you were the main advocate of the position that the code's
>> existing behavior is correct.  Does that mean I can go change it? ;-)

> Yes, after you slap me around a bit for being so wrong. Do you
> remember when we were discussing it? I want to go back and see why I
> thought this was right. I'm guessing that the example was not phrased
> in exactly this way, and that there may be some other behavior we need
> to maintain. (Otherwise, I might have used up my "one wrong idea per
> year" ;)

Actually, it may be my recollection that's wrong.  The only discussion
of the point that I can find right now is the thread "SUM() and GROUP
BY" from around 1/12/99 in pghackers, and it seems to be mostly focused
on arguments about whether you should get NULL or 0 from a no-input
SUM...

I would've sworn I remember a couple of other related threads in the
past year or so, but I cannot find them now.

Anyway, unless someone speaks up in favor of the way the code currently
works, I will see about changing the results for the GROUP-BY-with-no-
input-rows case.  I got a few other things to do first though.
        regards, tom lane

Re: [HACKERS] Phantom row from aggregate in self-join in 6.5

From

"D'Arcy" "J.M." Cain

Date:

24 July 1999, 08:01:01

Thus spake Tom Lane
> IIRC, you were the main advocate of the position that the code's
> existing behavior is correct.  Does that mean I can go change it? ;-)

I vote (again) for yes.  It's so annoying having to add code to test
each returned value against 0 just in case there is only one returned
value that needs to be checked.

"SELECT COUNT(*) FROM table" should always return one row but "SELECT
COUNT(*) FROM table GROUP BY somethng" returns a variable number of
rows anyway so what's the harm in extending "one or more" to "zero
or more" returned rows?  We have to iterate through whatever the
count is anyway in our programs.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.