Re: get distinct + group by then filter - Mailing list pgsql-sql

From Filip Rembiałkowski
Subject Re: get distinct + group by then filter
Date
Msg-id 92869e660912180420h548ec7dak2f85f0780a8d0318@mail.gmail.com
Whole thread Raw
In response to get distinct + group by then filter  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Responses Re: get distinct + group by then filter  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
List pgsql-sql


2009/12/17 Ivan Sergio Borgonovo <mail@webthatworks.it>
I've a web application and I'm trying to do some reporting on
affiliate commission

create table tracky_hit (
 hitid serial,
 esid varchar(32), -- related to browser session
 track_time timestamp,
 aid varchar(32), -- affiliate code
 -- some other tracking stuff
);

create table tracky_event (
 eventid serial,
 esid varchar(32) references tracky_hit (esid)

This imples that tracky_hit.esid is at least UNIQUE.

);

create table tracky_ordergroup_event (
 ordergroupid int references ...,
 eventid int references tracky_event (eventid)
);

Now I'd like to pick up the first hit for each esid in a given
interval of time for a given aid and relate them with ordergroupid.

aid may change across the same esid.

If  tracky_hit.esid is unique,  then why same esid can have many aids?

Can you specify more complete schema (at least PKeys would be nice)?

 


Getting the first hit for each esid can be done:

select min(hitid) as h
 from tracky_hit
   group by esid;

or

select distinct on (esid) hitid
 from tracky_hit
   order by esid, track_time;

DISTINCT ON seems a good aproach tu such queries.

 

If I put a where aid='somestuff' right in the above query... I'm not
picking up the first hit in an esid.

The only way that comes to my mind to solve the problem is applying
the condition later in a subquery, but no conditions means a lot of
data returned.

I've a similar problem with the interval: if I chop in the middle of
a session I may not pick up the beginning of each session.
Furthermore I've to count session just once even if they cross the
boundary of an interval.

I could do something like:

select oe.ordergroupid from
 tracky_ordergroup_event oe
 join tracky_event e on e.eventid=oe.eventid
 join tracky_hit th on th.esid=e.esid
 where th.hitid in
   (select distinct on (esid) hitid
     from tracky_hit
     where track_time between
       ('2009-12-01'::timestamp - interval '1 days')
       and
       ('2009-12-01'::timestamp + interval '1 months' + interval '1
         days')
     order by esid, track_time
   )
 and th.aid='someaid'
 and th.track_time between
         ('2009-12-01'::timestamp)
       and
         ('2009-12-01'::timestamp + interval '1 months');

but this looks awful. Any better way?

I'm on 8.3 and no short term plan to move to 8.4

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql



--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

pgsql-sql by date:

Previous
From: Ivan Sergio Borgonovo
Date:
Subject: get distinct + group by then filter
Next
From: Ivan Sergio Borgonovo
Date:
Subject: Re: get distinct + group by then filter