Thread: prob with aggregate and group by - returns multiples

prob with aggregate and group by - returns multiples

From
George Dau
Date:
I have a group by that groups some, but not all, identical rows. Here are
the details:
I can reproduce the problem using one single table, details below;

Table    = hits
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| userid                           | varchar                          |
12 |
| dat                              | date                             |
4 |
| tim                              | time                             |
8 |
| ipa                              | int4                             |
4 |
| ipb                              | int4                             |
4 |
| ipc                              | int4                             |
4 |
| ipd                              | int4                             |
4 |
| site                             | varchar                          |
50 |
+----------------------------------+----------------------------------+-----
--+

I want a report showing how many occurrences of "site" there are for each
distinct "site".
There is a lot of data, in there, so I'll look at a particular example of
the problem. When
I run this query:

select count(site), site from hitsgroup by site;

The output contains lines like the following. Note that these are all
consecutive in the output.
   2|xlink.zdnet.com   2|xlink.zdnet.com   1|xlink.zdnet.com   2|xlink.zdnet.com   2|xlink.zdnet.com
2|xlink.zdnet.com  3|xlink.zdnet.com   2|xlink.zdnet.com   3|xlink.zdnet.com   2|xlink.zdnet.com   1|xlink.zdnet.com
 

Suspecting that there were differences in each xlink.zdnet.com, I counted
just them:

web=> select count (*) from hits where site='xlink.zdnet.com';
count
-----  22
(1 row)

So, all 22 xlink.zdnet.com are selected above, but they have not grouped in
the previous
one. Any ideas why?

On a lesser note: I tried "select * into temp from hits" as per the doco,
but it barfs. Looks
like the syntax has changed. Any current "railway" diagrams for the commands
anywhere? Also,
where can I find out about other environment settings like PGDATESTYLE? I
know about that one,
but what others are there?

I'm trying to port from Oracle. Any guides? There are many differences in
the SQL.


**************************************************************
The information contained in this E-Mail is confidential
and is intended only for the use of the addressee(s).
If you receive this E-Mail in error, any use, distribution
or copying of this E-Mail is not permitted.  You are 
requested to forward unwanted E-Mail and address any problems
to the MIM Holdings Limited Help Desk. 
E-Mail: helpdesk@mim.com.au or phone: Australia 07 3833 8042.
**************************************************************


Re: [SQL] prob with aggregate and group by - returns multiples

From
Tom Lane
Date:
George Dau <gedau@isa.mim.com.au> writes:
> select count(site), site from hits group by site;

> The output contains lines like the following. Note that these are all
> consecutive in the output.

>     2|xlink.zdnet.com
>     2|xlink.zdnet.com
>     1|xlink.zdnet.com
>     2|xlink.zdnet.com
>     2|xlink.zdnet.com
>     2|xlink.zdnet.com
>     3|xlink.zdnet.com
>     2|xlink.zdnet.com
>     3|xlink.zdnet.com
>     2|xlink.zdnet.com
>     1|xlink.zdnet.com

> Suspecting that there were differences in each xlink.zdnet.com, I counted
> just them:

> web=> select count (*) from hits where site='xlink.zdnet.com';
> count
> -----
>    22
> (1 row)

> So, all 22 xlink.zdnet.com are selected above, but they have not grouped in
> the previous one. Any ideas why?

Wow, that is bizarre.  My first thought was that you had varying numbers
of trailing blanks in the "site" values, but your second example seems
to disprove that theory.

What Postgres version are you running, and on what platform?  Can you
generate a self-contained example (a script that demonstrates the error
from a standing start)?  I suspect you may be hitting a platform-
specific porting problem, but it's just speculation unless we have a
self-contained test case to try on other machines.

> On a lesser note: I tried "select * into temp from hits" as per the doco,
> but it barfs.

Postgres thinks that TEMP is a keyword, so it won't take it as a table
name unless you put quotes around it.  If we have doco examples that
use TEMP as a table name, they need to be changed --- do you recall
where you saw that, exactly?
        regards, tom lane


Re: [SQL] prob with aggregate and group by - returns multiples

From
Peter Eisentraut
Date:
Tom Lane writes:

> > On a lesser note: I tried "select * into temp from hits" as per the doco,
> > but it barfs.
> 
> Postgres thinks that TEMP is a keyword, so it won't take it as a table
> name unless you put quotes around it.

This is really an unfortunate case where someone should have read the SQL
standard before putting in a feature. The SQL keyword is TEMPORARY, and
TEMP is really a popular name for a dummy table.

I tried making TEMP a ColId but it croaks on this syntactic contruct:

SELECT xxx INTO [TEMP] [TABLE] tablename

which is incidentally not SQL either. If someone is interested in allowing
'temp' as an identifier, there doesn't seem to be a good way without
requiring the TABLE keyword above. Would that be worth it for 7.0 maybe?

Of course the documentation should be changed to TEMPORARY as well in
various places.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiples

From
Thomas Lockhart
Date:
> > Postgres thinks that TEMP is a keyword, so it won't take it as a table
> > name unless you put quotes around it.
> This is really an unfortunate case where someone should have read the SQL
> standard before putting in a feature. The SQL keyword is TEMPORARY, and
> TEMP is really a popular name for a dummy table.

So why not yank TEMP and require TEMPORARY? Saving an extra 5
characters of typing is not a good enough reason to keep it imho, and
if the SQL92 standard requires a particular form why bother extending
it?

A major release is a good time to adjust syntax to promote
compliance...
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> This is really an unfortunate case where someone should have read the SQL
>> standard before putting in a feature. The SQL keyword is TEMPORARY, and
>> TEMP is really a popular name for a dummy table.

> So why not yank TEMP and require TEMPORARY?

Probably we ought to stop to ask why TEMP is in there to begin with;
perhaps for compatibility with some other RDBMS?

If not, I'd vote for pulling it out.  That's a heck of a poor word to
reserve.
        regards, tom lane


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Bruce Momjian
Date:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> >> This is really an unfortunate case where someone should have read the SQL
> >> standard before putting in a feature. The SQL keyword is TEMPORARY, and
> >> TEMP is really a popular name for a dummy table.
> 
> > So why not yank TEMP and require TEMPORARY?
> 
> Probably we ought to stop to ask why TEMP is in there to begin with;
> perhaps for compatibility with some other RDBMS?

It was me.  Informix uses it.

> 
> If not, I'd vote for pulling it out.  That's a heck of a poor word to
> reserve.

I am afraid of lots of user complaints, even if we had not already used
TEMP.


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Thomas Lockhart
Date:
> > If not, I'd vote for pulling it out.  That's a heck of a poor word to
> > reserve.
> I am afraid of lots of user complaints, even if we had not already used
> TEMP.

OK, but we've already got "user complaints" about TEMP being a
reserved word, so that part seems to balance out. There is apparently
no basis in published standards for TEMP being a reserved word. And
btw it is not currently documented as a reserved word in
syntax.sgml...
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiples

From
Peter Eisentraut
Date:
On Tue, 29 Feb 2000, Thomas Lockhart wrote:

> > > Postgres thinks that TEMP is a keyword, so it won't take it as a table
> > > name unless you put quotes around it.
> > This is really an unfortunate case where someone should have read the SQL
> > standard before putting in a feature. The SQL keyword is TEMPORARY, and
> > TEMP is really a popular name for a dummy table.
> 
> So why not yank TEMP and require TEMPORARY? Saving an extra 5
> characters of typing is not a good enough reason to keep it imho, and
> if the SQL92 standard requires a particular form why bother extending
> it?
> 
> A major release is a good time to adjust syntax to promote
> compliance...

I've been (lightly) bashed in the past for proposing such things (see
END/ABORT) but I'm with you. I think that TEMP may be far too wide-spread
by now, though.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returnsmultiples

From
Thomas Lockhart
Date:
> > A major release is a good time to adjust syntax to promote
> > compliance...
> I've been (lightly) bashed in the past for proposing such things (see
> END/ABORT) but I'm with you. I think that TEMP may be far too wide-spread
> by now, though.

Well, imho the TEMP issue is not identical to END/ABORT. For TEMP, we
are unnecessarily restricting the space of possible identifiers,
eliminating a common and obvious name. The fix is trivial, and the
affected parties are *only* those who use temporary tables and who
chose to *not* use SQL92 syntax, which was always available.
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Don Baccus
Date:
At 06:48 AM 2/29/00 +0000, Thomas Lockhart wrote:
>> > If not, I'd vote for pulling it out.  That's a heck of a poor word to
>> > reserve.
>> I am afraid of lots of user complaints, even if we had not already used
>> TEMP.
>
>OK, but we've already got "user complaints" about TEMP being a
>reserved word, so that part seems to balance out. There is apparently
>no basis in published standards for TEMP being a reserved word. And
>btw it is not currently documented as a reserved word in
>syntax.sgml...

I vote for the SQL92 TEMPORARY.  Let's not add a keyword that is non-standard
just because one or another commercial database makes use of it, unless
there's some real functionality to be gained that's not covered by the
standard.

TEMP is covered in SQL92 by TEMPORARY.

As an example of when adopting a construct from another commercial database
makes sense to me, SEQUENCE and SERIAL are both convenient means of generating
unique keys that have no equivalent in the standard.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


>>>> If not, I'd vote for pulling it out.  That's a heck of a poor word to
>>>> reserve.
>> I am afraid of lots of user complaints, even if we had not already used
>> TEMP.

> OK, but we've already got "user complaints" about TEMP being a
> reserved word, so that part seems to balance out. There is apparently
> no basis in published standards for TEMP being a reserved word. And
> btw it is not currently documented as a reserved word in
> syntax.sgml...

The real problem is not that we accept TEMP as a synonym for TEMPORARY;
it is that we treat TEMP as a reserved word.  What are the chances that
we could make it a member of the ColId list?  I am thinking that
"... INTO TEMP temp" is *not* ambiguous given one token lookahead...
        regards, tom lane


Don Baccus <dhogaza@pacifier.com> writes:
> I vote for the SQL92 TEMPORARY.  Let's not add a keyword that is non-standard
> just because one or another commercial database makes use of it,

You're missing the point: we are not talking about *adding* a keyword,
we're talking about *removing* one that we've already supported for
a year or so.  That changes matters considerably, IMHO.

I have in fact been able to make a conflict-free grammar in which TEMP
is accepted but not reserved.  It requires a certain amount of
redundancy in the productions (see below), but I think this is a
worthwhile tradeoff for not breaking existing user code.

Shall I commit this?
        regards, tom lane


/** Redundancy here is needed to avoid shift/reduce conflicts,* since TEMP is not a reserved word.  See also OptTemp.**
Theresult is a cons cell (not a true list!) containing* a boolean and a table name.*/
 
OptTempTableName:  TEMPORARY opt_table relation_name               { $$ = lcons(makeInteger(TRUE), (List *) $3); }
    | TEMP opt_table relation_name               { $$ = lcons(makeInteger(TRUE), (List *) $3); }           | LOCAL
TEMPORARYopt_table relation_name               { $$ = lcons(makeInteger(TRUE), (List *) $4); }           | LOCAL TEMP
opt_tablerelation_name               { $$ = lcons(makeInteger(TRUE), (List *) $4); }           | GLOBAL TEMPORARY
opt_tablerelation_name               {                   elog(ERROR, "GLOBAL TEMPORARY TABLE is not currently
supported");                  $$ = lcons(makeInteger(TRUE), (List *) $4);               }           | GLOBAL TEMP
opt_tablerelation_name               {                   elog(ERROR, "GLOBAL TEMPORARY TABLE is not currently
supported");                  $$ = lcons(makeInteger(TRUE), (List *) $4);               }           | TABLE
relation_name              { $$ = lcons(makeInteger(FALSE), (List *) $2); }           | relation_name               {
$$= lcons(makeInteger(FALSE), (List *) $1); }       ;
 


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Peter Eisentraut
Date:
On Tue, 29 Feb 2000, Don Baccus wrote:

> I vote for the SQL92 TEMPORARY.  Let's not add a keyword that is non-standard
> just because one or another commercial database makes use of it, unless
> there's some real functionality to be gained that's not covered by the
> standard.

The difference is that TEMP is already a keyword since 6.5 and we're
considering removing it.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Bruce Momjian
Date:
> > > If not, I'd vote for pulling it out.  That's a heck of a poor word to
> > > reserve.
> > I am afraid of lots of user complaints, even if we had not already used
> > TEMP.
> 
> OK, but we've already got "user complaints" about TEMP being a
> reserved word, so that part seems to balance out. There is apparently
> no basis in published standards for TEMP being a reserved word. And
> btw it is not currently documented as a reserved word in
> syntax.sgml...

OK, I certainly didn't look at the standard to when I implemented TEMP
tables.  In fact, I was surprised it worked considering it is just a
hack on the cache code.

Let's forget I made a mistake, and consider how many people are going to
think they should use TEMP and how many TEMPORARY.  I personally would
guess TEMP and never TEMPORARY.  I wonder if others would too.

So are we willing to field questions from people trying to use TEMP
tables and trying TEMP and not TEMPORARY.  I realize the restriction on
a field called TEMP, but we don't get those very often.  How many people
are going to guess TEMP and not TEMPORARY?

Of course, as a Unix guy, I may have guessed TMP too.  :-)

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Bruce Momjian
Date:
Looks good to me.

> Don Baccus <dhogaza@pacifier.com> writes:
> > I vote for the SQL92 TEMPORARY.  Let's not add a keyword that is non-standard
> > just because one or another commercial database makes use of it,
> 
> You're missing the point: we are not talking about *adding* a keyword,
> we're talking about *removing* one that we've already supported for
> a year or so.  That changes matters considerably, IMHO.
> 
> I have in fact been able to make a conflict-free grammar in which TEMP
> is accepted but not reserved.  It requires a certain amount of
> redundancy in the productions (see below), but I think this is a
> worthwhile tradeoff for not breaking existing user code.
> 
> Shall I commit this?
> 
>             regards, tom lane
> 
> 
> /*
>  * Redundancy here is needed to avoid shift/reduce conflicts,
>  * since TEMP is not a reserved word.  See also OptTemp.
>  *
>  * The result is a cons cell (not a true list!) containing
>  * a boolean and a table name.
>  */
> OptTempTableName:  TEMPORARY opt_table relation_name
>                 { $$ = lcons(makeInteger(TRUE), (List *) $3); }
>             | TEMP opt_table relation_name
>                 { $$ = lcons(makeInteger(TRUE), (List *) $3); }
>             | LOCAL TEMPORARY opt_table relation_name
>                 { $$ = lcons(makeInteger(TRUE), (List *) $4); }
>             | LOCAL TEMP opt_table relation_name
>                 { $$ = lcons(makeInteger(TRUE), (List *) $4); }
>             | GLOBAL TEMPORARY opt_table relation_name
>                 {
>                     elog(ERROR, "GLOBAL TEMPORARY TABLE is not currently supported");
>                     $$ = lcons(makeInteger(TRUE), (List *) $4);
>                 }
>             | GLOBAL TEMP opt_table relation_name
>                 {
>                     elog(ERROR, "GLOBAL TEMPORARY TABLE is not currently supported");
>                     $$ = lcons(makeInteger(TRUE), (List *) $4);
>                 }
>             | TABLE relation_name
>                 { $$ = lcons(makeInteger(FALSE), (List *) $2); }
>             | relation_name
>                 { $$ = lcons(makeInteger(FALSE), (List *) $1); }
>         ;
> 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Christopher Sawtell
Date:
On Wed, 01 Mar 2000, Tom Lane wrote:
> Don Baccus <dhogaza@pacifier.com> writes:
> > I vote for the SQL92 TEMPORARY.  Let's not add a keyword that is non-standard
> > just because one or another commercial database makes use of it,
> 
> You're missing the point: we are not talking about *adding* a keyword,
> we're talking about *removing* one that we've already supported for
> a year or so.  That changes matters considerably, IMHO.
> 
> I have in fact been able to make a conflict-free grammar in which TEMP
> is accepted but not reserved.  It requires a certain amount of
> redundancy in the productions (see below), but I think this is a
> worthwhile tradeoff for not breaking existing user code.
> 
> Shall I commit this?
 Is there not also the possibility of  making this a configure-time option?

./configure --temporary-table=TEMPORARY   # or TEMP as you wish

Default to TEMPORARY in accord with SQL 92.

--
Sincerely etc.,
NAME       Christopher Sawtell - Support Engineer - iOpen Technologies Ltd.CELL PHONE 021 257 4451ICQ UIN
45863470EMAIL     chris @ iopen . co . nz,  csawtell @ xtra . co . nzCNOTES
ftp://ftp.funet.fi/pub/languages/C/tutorials/sawtell_C.tar.gz
---->>> Please refrain from using HTML attachments in e-mails to me. <<<----



Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Thomas Lockhart
Date:
> So are we willing to field questions from people trying to use TEMP
> tables and trying TEMP and not TEMPORARY.  I realize the restriction on
> a field called TEMP, but we don't get those very often.  How many people
> are going to guess TEMP and not TEMPORARY?

Well, that's one reason to stick to a published standard. No guessing
required since one could look it up ;)
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> So are we willing to field questions from people trying to use TEMP
>> tables and trying TEMP and not TEMPORARY.  I realize the restriction on
>> a field called TEMP, but we don't get those very often.  How many people
>> are going to guess TEMP and not TEMPORARY?

> Well, that's one reason to stick to a published standard. No guessing
> required since one could look it up ;)

As of an hour ago, neither one is a reserved word ;-)
        regards, tom lane


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Bruce Momjian
Date:
> > So are we willing to field questions from people trying to use TEMP
> > tables and trying TEMP and not TEMPORARY.  I realize the restriction on
> > a field called TEMP, but we don't get those very often.  How many people
> > are going to guess TEMP and not TEMPORARY?
> 
> Well, that's one reason to stick to a published standard. No guessing
> required since one could look it up ;)

If you survive reading the standard.  :-)

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
Thomas Lockhart
Date:
> As of an hour ago, neither one is a reserved word ;-)

Darn, I was just warming up to a good argument :))
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Re: [SQL] prob with aggregate and group by - returns multiplesh

From
"Moray McConnachie"
Date:
----- Original Message -----
From: Bruce Momjian <pgman@candle.pha.pa.us>
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; 'pgsql-sql@postgresql.org'
<pgsql-sql@postgreSQL.org>; PostgreSQL Development
<pgsql-hackers@postgreSQL.org>
Sent: Wednesday, March 01, 2000 6:18 AM
Subject: Re: [HACKERS] Re: [SQL] prob with aggregate and group by -
returns multiplesh


> > > So are we willing to field questions from people trying to use
TEMP
> > > tables and trying TEMP and not TEMPORARY.  I realize the
restriction on
> > > a field called TEMP, but we don't get those very often.  How
many people
> > > are going to guess TEMP and not TEMPORARY?
> >
> > Well, that's one reason to stick to a published standard. No
guessing
> > required since one could look it up ;)
>
> If you survive reading the standard.  :-)

If you can even *find* the standard :->

(is there an on-line source - all links I've ever seen 404ed...?)