Thread: Count dates distinct within an interval

Count dates distinct within an interval

From
Dmitry Tkach
Date:
Hi, everybody!

I was trying to formulate a sql query, that I now think is impossible :-(
I just thought, I'd run it by you before I give up - perhaps, you guys 
would have some ideas...

Suppose, I have a table like this

create table test
(  stuff int,  stamp timestamp
);

Now, I want to count the occurences of each value of stuff in the table, 
but so that entries within 24 hours from each other count as one...
The closest I could think of is:

select stuff, count (distinct date_trunc ('day', stamp)) from test group 
by stuff;

This doesn't do exactly what I need though - for example, if one entry 
is one minute before midnight, and the other one is two minutes later, 
they'd get counted  as 2, and what I am looking for is the way to get 
them collapsed into one as long as they are less then 24 hours apart...

Now, I am pretty sure, it is simply impossible to do what I want with 
count (distinct...) because my 'equality' is not transitive - for 
example, three entries, like
A = 2001 - 01- 01 20:20:00
B = 2001 - 01 - 02 20:19:00
C = 2001 - 01 - 02 20:21:00

Should be counted as *two* (A === B, and B === C, but *not* A === C)...

Also, I could certainly write a simple function, that would get all the 
entries in order, and scan through them, counting according to my rules...
But I was hoping to find some way to do this in plain sql though...

Any ideas?

Thanks!

Dima



Re: Count dates distinct within an interval

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



> Now, I want to count the occurences of each value of stuff in the table, 
> but so that entries within 24 hours from each other count as one...
>...
> A = 2001 - 01- 01 20:20:00
> B = 2001 - 01 - 02 20:19:00
> C = 2001 - 01 - 02 20:21:00
> Should be counted as *two* (A === B, and B === C, but *not* A === C)...

You need to elaborate on your logic some more, and state exactly what you 
would want in the A,B,C case above. Does B get lumped with A or with C? 
It is within 24 hours of both, after all. Does C not get lumped in with 
B simply because B has already been lumped in with A?

Perhaps showing us the sample output of an ideal query would be best, 
along with some sample rows (e.g. use real insert statements)


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200307151045
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/FBS9vJuQZxSWSsgRAnk3AJ0bqyDk6iZWqSZuHfZslFCjxwl7fgCfaZ7r
XdwpPsO4OaTa9YpjmXx1hmA=
=IFRz
-----END PGP SIGNATURE-----





Re: Count dates distinct within an interval

From
Dmitry Tkach
Date:
>
>
>You need to elaborate on your logic some more, and state exactly what you 
>would want in the A,B,C case above. Does B get lumped with A or with C? 
>It is within 24 hours of both, after all. Does C not get lumped in with 
>B simply because B has already been lumped in with A?
>
Yes. The first (earliest) entry is counted, then all the later ones a 
skipped as long as they are within 24 hours from the last one, that's 
counted.
In this case, A is counted, B is skipped, because it is within 24 hours 
of A, then C is counted, because it is more than 24 hours from A.

Thanks!

Dima




Re: Count dates distinct within an interval

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Also, I could certainly write a simple function, that would get all the 
> entries in order, and scan through them, counting according to my rules...
> But I was hoping to find some way to do this in plain sql though...

In this example, you are best off using a function.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200307151137

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/FCAgvJuQZxSWSsgRAg1jAJ9kS9PpIiMkij6TtOg63O59TeezPACgzhMF
ZM/84SEPP4doDR8fsGpnUBU=
=w5Wa
-----END PGP SIGNATURE-----



Re: Count dates distinct within an interval

From
Stuart
Date:
Dmitry Tkach wrote:
> Hi, everybody!
> 
> I was trying to formulate a sql query, that I now think is impossible :-(
> I just thought, I'd run it by you before I give up - perhaps, you guys 
> would have some ideas...
> 
> Suppose, I have a table like this
> 
> create table test
> (
>   stuff int,
>   stamp timestamp
> );
> 
> Now, I want to count the occurences of each value of stuff in the table, 
> but so that entries within 24 hours from each other count as one...
> The closest I could think of is:
> 
> select stuff, count (distinct date_trunc ('day', stamp)) from test group 
> by stuff;
> 
> This doesn't do exactly what I need though - for example, if one entry 
> is one minute before midnight, and the other one is two minutes later, 
> they'd get counted  as 2, and what I am looking for is the way to get 
> them collapsed into one as long as they are less then 24 hours apart...
> 
> Now, I am pretty sure, it is simply impossible to do what I want with 
> count (distinct...) because my 'equality' is not transitive - for 
> example, three entries, like
> A = 2001 - 01- 01 20:20:00
> B = 2001 - 01 - 02 20:19:00
> C = 2001 - 01 - 02 20:21:00
> 
> Should be counted as *two* (A === B, and B === C, but *not* A === C)...
> 
> Also, I could certainly write a simple function, that would get all the 
> entries in order, and scan through them, counting according to my rules...
> But I was hoping to find some way to do this in plain sql though...
> 
> Any ideas?
> 
> Thanks!
> 
> Dima
> 

You would probably be able to speed the following up using immutable
funtions to aid the query, or just a function to do it. However I think
this does what you asked in a query. I've put a script at the end.
hth,
- Stuart
-- s is the stuff to group by
-- dt is the datetime thing
create table Q (
s int4,
dt timestamptz);

truncate Q;

INSERT INTO Q (s,dt) VALUES (1,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/02 04:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/01 08:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/02'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/05 23:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/06 22:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/07 05:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (3,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/02 05:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/03 04:00'::timestamptz);

SELECT COUNT(*),s FROM Q WHERE dt IN (SELECT min(R.dt) FROM Q AS R WHERE 
Q.s=R.s)
OR dt IN (SELECT min(P.dt) FROM Q AS P WHERE P.s=Q.s AND
P.dt>=date_trunc('day',Q.dt)-
CASE WHEN
(SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT 
EXISTS
(SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt 
AND U.s=Q.s))>Q.dt::time THEN '1 day'::interval
ELSE
'0 day'::interval
END
+(SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT 
EXISTS
(SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt 
AND U.s=Q.s)))
GROUP BY s;