Re: Selecting distinct records - Mailing list pgsql-php
From | David Busby |
---|---|
Subject | Re: Selecting distinct records |
Date | |
Msg-id | 009c01c291c7$b617aec0$4000000a@busbydev Whole thread Raw |
In response to | Re: Selecting distinct records ("Dave [Hawk-Systems]" <dave@hawk-systems.com>) |
List | pgsql-php |
Dave, How about a job that runs once that cleans the whole thing? Or: [Warning, pseudo PHPish code] $rs_sid = select distinct (sessionid) from "logs"; while ($s = pg_fetch_object($rs_sid) { $rs_stuff = select * from "logs" where .... and "sessionid" = $s->id for (loop of $rs_stuff) { $total+=$rs_stuff->sessiontime } echo "Session $s->id has $total seconds used"; } This method involves a select for each session (might be slow on millions of records :) ). This might work once or twice but I don't see it as a suitable solution if this has to happen on a daily basis. I know that the latest PostgreSQL does nested selects just fine (I've got one paticular statement that has three nested selects ) and it's performance isn't as bad as you'd think, well at least after the query analyzer has seen it once and it gets cached. /B ----- Original Message ----- From: "Dave [Hawk-Systems]" <dave@hawk-systems.com> To: "David Busby" <busby@pnts.com>; <pgsql-php@postgresql.org> Sent: Thursday, November 21, 2002 13:26 Subject: RE: Selecting distinct records > > Try: > >select distinct on (sessionid) sum(sessiontime) from logs > >where name='joeblowuser' and datetime > 1036040400 > >group by sessionid. > > sorry, I wasn't clear... the problem arises when we need to sort the data for > output. postgres demands that the distinct items be first in the ordering > process, which would not allow display by session, name etc... > > the obvious result would be to nest the select statements but can't seem to get > that to work either (working with version 7.0.3) > > eg/ select * from (select distinct on (sessionid) from logs where....) as > tempname order by sessiondate desc > > gives me an error on the second select, so not sure that is a workable solution > either. > > >sum() is an aggerate (sp?) function, that means it munges a field > >(sessiontime) from multiple records into one field in one record. Since you > >are also selecting sessionid (from mulitple records) you need to munge it > >some how, that munge is accomplished via 'group by'. From your previous > >e-mail it seems that (IMHO) the real problem is that duplicates are getting > >inserted via external hardware interaction, this select might be a bandage > >on a wound whose true size isn't known... > > agreed, and that problem has been corrected, but we are dealing with close to a > million records which have these duplicates strewn about within... rather > annoying, looking for a bandaid in teh select to avoid intensive post-select > processing of the output. > > Dave >