Thread: Group By Dilemma

Group By Dilemma

From
"Tim Perdue"
Date:
Hello, I am trying to do a GROUP BY so that I can get a count of messages
with the same subject. The problem is, the optimizer won't let me group by
just one column, when I am selecting several columns.

I suppose Postgres doesn't know which rows to leave out and which ones to
display, but I'm hoping someone can offer a solution....

SELECT
fld_mailid,
fld_mail_date,
fld_mail_is_followup,
fld_mail_from,
fld_mail_subject,   <-- Group only on this, so I can get a count
count(*)
FROM tbl_mail_archive
WHERE fld_mail_list=1 AND
fld_mail_year=1999 AND
fld_mail_month=06
group by fld_mail_subject
ORDER BY fld_mail_date DESC
LIMIT 26 OFFSET 0;

Tim Perdue
PHPBuilder.com / GotoCity.com / Geocrawler.com





Slashdot Query

From
Chris Bitmead
Date:
I want to do a query that is like the icons on the top of
http://slashdot.org. That is I want to select the 5 most recent distinct
categories from a bunch of stories. So I have...

SELECT DISTINCT category.oid, category.title, category.image FROM story,
category* WHERE story.category = category.oid AND story.approved ORDER
BY datetime DESC LIMIT 5;

The trouble is it doesn't return distinct results, but rather it returns
duplicates. Any ideas how I can do this query?

I thought there might be some sub-select solution, but I don't really
understand subselects. I thought of,

SELECT distinct * from (SELECT category.oid, category.title,
category.image FROM story, category* WHERE story.category = category.oid
AND story.approved ORDER BY datetime DESC )LIMIT 5;

But that doesn't work in any shape or form.


Re: [SQL] Group By Dilemma

From
"Oliver Elphick"
Date:
"Tim Perdue" wrote: >Hello, I am trying to do a GROUP BY so that I can get a count of messages >with the same subject.
Theproblem is, the optimizer won't let me group by >just one column, when I am selecting several columns. 
 
You can only have the grouped column and aggregates in your query.
 >I suppose Postgres doesn't know which rows to leave out and which ones to >display, but I'm hoping someone can offer
asolution.... > >SELECT >fld_mailid, >fld_mail_date, >fld_mail_is_followup, >fld_mail_from, >fld_mail_subject,   <--
Grouponly on this, so I can get a count >count(*) >FROM tbl_mail_archive >WHERE fld_mail_list=1 AND >fld_mail_year=1999
AND>fld_mail_month=06 >group by fld_mail_subject >ORDER BY fld_mail_date DESC >LIMIT 26 OFFSET 0;
 

What do you expect to see in the other columns?  Why are you including
them at all?

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "But without faith it is impossible to please
him;for      he that cometh to God must believe that he is, and      that he is a rewarder of them that diligently seek
    him."       Hebrews 11:6 
 




Re: [SQL] Slashdot Query

From
Tom Lane
Date:
Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> SELECT DISTINCT category.oid, category.title, category.image FROM story,
> category* WHERE story.category = category.oid AND story.approved ORDER
> BY datetime DESC LIMIT 5;

> The trouble is it doesn't return distinct results, but rather it returns
> duplicates.

This is a known problem --- SELECT DISTINCT implies sorting by the
fields that are being "distincted" on, since the actual duplicate-
elimination relies on a uniq(1)-like adjacent-duplicates filter.
Forcing a sort by a different field breaks the duplicate eliminator.

It is not real clear what the correct behavior is, which is why
nothing's been done about it; we've gone round on the issue a couple
of times (see the pghackers archives).  My own thought is that the
query as given above should be illegal, since there is no unique
value of datetime to go with a "distinct" set of oid, title, image.

You might consider using GROUP BY rather than DISTINCT if you want
to order the results in a particular way, saySELECT oid, title, image ...    GROUP BY oid, title, image ORDER BY
min(datetime)
where you use an aggregate like min or max to resolve the ambiguity
about which datetime to associate with a particular group...
        regards, tom lane


Re: [SQL] Slashdot Query

From
Chris Bitmead
Date:
Tom Lane wrote:

> It is not real clear what the correct behavior is, which is why
> nothing's been done about it; we've gone round on the issue a couple
> of times (see the pghackers archives).  My own thought is that the
> query as given above should be illegal, since there is no unique
> value of datetime to go with a "distinct" set of oid, title, image.

Are yes I remember the issues now. I guess it would be nice if there was
some clause to specify which to elminate. Like SELECT DISTINCT
ELIMINATING LAST which then uses the ORDER BY clause to decide whether
to eliminate the first or last. But I can see that would be fiddly to
implement.

> 
> You might consider using GROUP BY rather than DISTINCT if you want
> to order the results in a particular way, say
>         SELECT oid, title, image ...
>                 GROUP BY oid, title, image ORDER BY min(datetime)

I'll give it a go.

> where you use an aggregate like min or max to resolve the ambiguity
> about which datetime to associate with a particular group...
> 
>                         regards, tom lane

-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com


Re: [SQL] Slashdot Query

From
Chris Bitmead
Date:
Tom Lane wrote:
> You might consider using GROUP BY rather than DISTINCT if you want
> to order the results in a particular way, say
>         SELECT oid, title, image ...
>                 GROUP BY oid, title, image ORDER BY min(datetime)

I tried the following query and it crashed the backend. (CVS of a couple
of days ago).

SELECT category.oid, category.title, category.image FROM story,
category* WHERE story.category = category.oid AND story.approved GROUP
BY category.oid, category.title, category.image ORDER BY min(datetime);
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible.  Terminating.


-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com


Re: [SQL] Slashdot Query

From
Tom Lane
Date:
Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> SELECT category.oid, category.title, category.image FROM story,
> category* WHERE story.category = category.oid AND story.approved GROUP
> BY category.oid, category.title, category.image ORDER BY min(datetime);
> pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally
>         before or while processing the request.
> We have lost the connection to the backend, so further processing is
> impossible.  Terminating.

This is a known problem --- see my pghackers email of a few days ago,
"inherited GROUP BY is busted".
        regards, tom lane