Thread: Count dates distinct within an interval
Hi, everybody! I was trying to formulate a sql query, that I now think is impossible :-( I just thought, I'd run it by you before I give up - perhaps, you guys would have some ideas... Suppose, I have a table like this create table test ( stuff int, stamp timestamp ); Now, I want to count the occurences of each value of stuff in the table, but so that entries within 24 hours from each other count as one... The closest I could think of is: select stuff, count (distinct date_trunc ('day', stamp)) from test group by stuff; This doesn't do exactly what I need though - for example, if one entry is one minute before midnight, and the other one is two minutes later, they'd get counted as 2, and what I am looking for is the way to get them collapsed into one as long as they are less then 24 hours apart... Now, I am pretty sure, it is simply impossible to do what I want with count (distinct...) because my 'equality' is not transitive - for example, three entries, like A = 2001 - 01- 01 20:20:00 B = 2001 - 01 - 02 20:19:00 C = 2001 - 01 - 02 20:21:00 Should be counted as *two* (A === B, and B === C, but *not* A === C)... Also, I could certainly write a simple function, that would get all the entries in order, and scan through them, counting according to my rules... But I was hoping to find some way to do this in plain sql though... Any ideas? Thanks! Dima
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Now, I want to count the occurences of each value of stuff in the table, > but so that entries within 24 hours from each other count as one... >... > A = 2001 - 01- 01 20:20:00 > B = 2001 - 01 - 02 20:19:00 > C = 2001 - 01 - 02 20:21:00 > Should be counted as *two* (A === B, and B === C, but *not* A === C)... You need to elaborate on your logic some more, and state exactly what you would want in the A,B,C case above. Does B get lumped with A or with C? It is within 24 hours of both, after all. Does C not get lumped in with B simply because B has already been lumped in with A? Perhaps showing us the sample output of an ideal query would be best, along with some sample rows (e.g. use real insert statements) - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200307151045 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE/FBS9vJuQZxSWSsgRAnk3AJ0bqyDk6iZWqSZuHfZslFCjxwl7fgCfaZ7r XdwpPsO4OaTa9YpjmXx1hmA= =IFRz -----END PGP SIGNATURE-----
> > >You need to elaborate on your logic some more, and state exactly what you >would want in the A,B,C case above. Does B get lumped with A or with C? >It is within 24 hours of both, after all. Does C not get lumped in with >B simply because B has already been lumped in with A? > Yes. The first (earliest) entry is counted, then all the later ones a skipped as long as they are within 24 hours from the last one, that's counted. In this case, A is counted, B is skipped, because it is within 24 hours of A, then C is counted, because it is more than 24 hours from A. Thanks! Dima
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Also, I could certainly write a simple function, that would get all the > entries in order, and scan through them, counting according to my rules... > But I was hoping to find some way to do this in plain sql though... In this example, you are best off using a function. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200307151137 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE/FCAgvJuQZxSWSsgRAg1jAJ9kS9PpIiMkij6TtOg63O59TeezPACgzhMF ZM/84SEPP4doDR8fsGpnUBU= =w5Wa -----END PGP SIGNATURE-----
Dmitry Tkach wrote: > Hi, everybody! > > I was trying to formulate a sql query, that I now think is impossible :-( > I just thought, I'd run it by you before I give up - perhaps, you guys > would have some ideas... > > Suppose, I have a table like this > > create table test > ( > stuff int, > stamp timestamp > ); > > Now, I want to count the occurences of each value of stuff in the table, > but so that entries within 24 hours from each other count as one... > The closest I could think of is: > > select stuff, count (distinct date_trunc ('day', stamp)) from test group > by stuff; > > This doesn't do exactly what I need though - for example, if one entry > is one minute before midnight, and the other one is two minutes later, > they'd get counted as 2, and what I am looking for is the way to get > them collapsed into one as long as they are less then 24 hours apart... > > Now, I am pretty sure, it is simply impossible to do what I want with > count (distinct...) because my 'equality' is not transitive - for > example, three entries, like > A = 2001 - 01- 01 20:20:00 > B = 2001 - 01 - 02 20:19:00 > C = 2001 - 01 - 02 20:21:00 > > Should be counted as *two* (A === B, and B === C, but *not* A === C)... > > Also, I could certainly write a simple function, that would get all the > entries in order, and scan through them, counting according to my rules... > But I was hoping to find some way to do this in plain sql though... > > Any ideas? > > Thanks! > > Dima > You would probably be able to speed the following up using immutable funtions to aid the query, or just a function to do it. However I think this does what you asked in a query. I've put a script at the end. hth, - Stuart -- s is the stuff to group by -- dt is the datetime thing create table Q ( s int4, dt timestamptz); truncate Q; INSERT INTO Q (s,dt) VALUES (1,'2003/01/01'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/02 04:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/01 08:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/02'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/05 23:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/06 22:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (1,'2003/01/07 05:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (3,'2003/01/01'::timestamptz); INSERT INTO Q (s,dt) VALUES (4,'2003/01/01'::timestamptz); INSERT INTO Q (s,dt) VALUES (4,'2003/01/02 05:00'::timestamptz); INSERT INTO Q (s,dt) VALUES (4,'2003/01/03 04:00'::timestamptz); SELECT COUNT(*),s FROM Q WHERE dt IN (SELECT min(R.dt) FROM Q AS R WHERE Q.s=R.s) OR dt IN (SELECT min(P.dt) FROM Q AS P WHERE P.s=Q.s AND P.dt>=date_trunc('day',Q.dt)- CASE WHEN (SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT EXISTS (SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt AND U.s=Q.s))>Q.dt::time THEN '1 day'::interval ELSE '0 day'::interval END +(SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT EXISTS (SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt AND U.s=Q.s))) GROUP BY s;