Thread: Group By Dilemma
Hello, I am trying to do a GROUP BY so that I can get a count of messages with the same subject. The problem is, the optimizer won't let me group by just one column, when I am selecting several columns. I suppose Postgres doesn't know which rows to leave out and which ones to display, but I'm hoping someone can offer a solution.... SELECT fld_mailid, fld_mail_date, fld_mail_is_followup, fld_mail_from, fld_mail_subject, <-- Group only on this, so I can get a count count(*) FROM tbl_mail_archive WHERE fld_mail_list=1 AND fld_mail_year=1999 AND fld_mail_month=06 group by fld_mail_subject ORDER BY fld_mail_date DESC LIMIT 26 OFFSET 0; Tim Perdue PHPBuilder.com / GotoCity.com / Geocrawler.com
I want to do a query that is like the icons on the top of http://slashdot.org. That is I want to select the 5 most recent distinct categories from a bunch of stories. So I have... SELECT DISTINCT category.oid, category.title, category.image FROM story, category* WHERE story.category = category.oid AND story.approved ORDER BY datetime DESC LIMIT 5; The trouble is it doesn't return distinct results, but rather it returns duplicates. Any ideas how I can do this query? I thought there might be some sub-select solution, but I don't really understand subselects. I thought of, SELECT distinct * from (SELECT category.oid, category.title, category.image FROM story, category* WHERE story.category = category.oid AND story.approved ORDER BY datetime DESC )LIMIT 5; But that doesn't work in any shape or form.
"Tim Perdue" wrote: >Hello, I am trying to do a GROUP BY so that I can get a count of messages >with the same subject. Theproblem is, the optimizer won't let me group by >just one column, when I am selecting several columns. You can only have the grouped column and aggregates in your query. >I suppose Postgres doesn't know which rows to leave out and which ones to >display, but I'm hoping someone can offer asolution.... > >SELECT >fld_mailid, >fld_mail_date, >fld_mail_is_followup, >fld_mail_from, >fld_mail_subject, <-- Grouponly on this, so I can get a count >count(*) >FROM tbl_mail_archive >WHERE fld_mail_list=1 AND >fld_mail_year=1999 AND>fld_mail_month=06 >group by fld_mail_subject >ORDER BY fld_mail_date DESC >LIMIT 26 OFFSET 0; What do you expect to see in the other columns? Why are you including them at all? -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID32B8FAA1 ======================================== "But without faith it is impossible to please him;for he that cometh to God must believe that he is, and that he is a rewarder of them that diligently seek him." Hebrews 11:6
Chris Bitmead <chris.bitmead@bigfoot.com> writes: > SELECT DISTINCT category.oid, category.title, category.image FROM story, > category* WHERE story.category = category.oid AND story.approved ORDER > BY datetime DESC LIMIT 5; > The trouble is it doesn't return distinct results, but rather it returns > duplicates. This is a known problem --- SELECT DISTINCT implies sorting by the fields that are being "distincted" on, since the actual duplicate- elimination relies on a uniq(1)-like adjacent-duplicates filter. Forcing a sort by a different field breaks the duplicate eliminator. It is not real clear what the correct behavior is, which is why nothing's been done about it; we've gone round on the issue a couple of times (see the pghackers archives). My own thought is that the query as given above should be illegal, since there is no unique value of datetime to go with a "distinct" set of oid, title, image. You might consider using GROUP BY rather than DISTINCT if you want to order the results in a particular way, saySELECT oid, title, image ... GROUP BY oid, title, image ORDER BY min(datetime) where you use an aggregate like min or max to resolve the ambiguity about which datetime to associate with a particular group... regards, tom lane
Tom Lane wrote: > It is not real clear what the correct behavior is, which is why > nothing's been done about it; we've gone round on the issue a couple > of times (see the pghackers archives). My own thought is that the > query as given above should be illegal, since there is no unique > value of datetime to go with a "distinct" set of oid, title, image. Are yes I remember the issues now. I guess it would be nice if there was some clause to specify which to elminate. Like SELECT DISTINCT ELIMINATING LAST which then uses the ORDER BY clause to decide whether to eliminate the first or last. But I can see that would be fiddly to implement. > > You might consider using GROUP BY rather than DISTINCT if you want > to order the results in a particular way, say > SELECT oid, title, image ... > GROUP BY oid, title, image ORDER BY min(datetime) I'll give it a go. > where you use an aggregate like min or max to resolve the ambiguity > about which datetime to associate with a particular group... > > regards, tom lane -- Chris Bitmead http://www.bigfoot.com/~chris.bitmead mailto:chris.bitmead@bigfoot.com
Tom Lane wrote: > You might consider using GROUP BY rather than DISTINCT if you want > to order the results in a particular way, say > SELECT oid, title, image ... > GROUP BY oid, title, image ORDER BY min(datetime) I tried the following query and it crashed the backend. (CVS of a couple of days ago). SELECT category.oid, category.title, category.image FROM story, category* WHERE story.category = category.oid AND story.approved GROUP BY category.oid, category.title, category.image ORDER BY min(datetime); pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. -- Chris Bitmead http://www.bigfoot.com/~chris.bitmead mailto:chris.bitmead@bigfoot.com
Chris Bitmead <chris.bitmead@bigfoot.com> writes: > SELECT category.oid, category.title, category.image FROM story, > category* WHERE story.category = category.oid AND story.approved GROUP > BY category.oid, category.title, category.image ORDER BY min(datetime); > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > We have lost the connection to the backend, so further processing is > impossible. Terminating. This is a known problem --- see my pghackers email of a few days ago, "inherited GROUP BY is busted". regards, tom lane