Thread: select within aggregate?
Hi!! Of course the following is possible in various ways but i wonder if there is an elegant and simple solution: A part from apache log: remote_host varchar(50) request_uri varchar(50) ts timestamp with time zone Now i want to see which host has done how many requests. This is not difficult: SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host; But now i would like to add a column to the result which shows the most recent requested uri for each group. This sounds quite easy at first glance but i see no simple solution. I suppose with max(ts) i could acquire the time of the most recent request within the group, but how do i get the corresponding request_uri without doing subquerys or something like that? Thank you very much! Klaus
On Fri, 6 May 2005 13:34:26 +0530 "Ramakrishnan Muralidharan" <ramakrishnanm@pervasive-postgres.com> wrote: > SELECT abc.remote_host , c , abc.request_uri , a.t FROM abc , ( select remote_host , count(*) as c , max( ts ) as t fromabc group by remote_host ) as a > where a.remote_host = abc.remote_host and abc.ts = a.t I thought about this too. But what happens if there are accidentally two sets with same timestamp in the table? I suppose the outer select would match both of them. But a DISTINCT should help to avoid this. Ok, if you think this is the propper way to meet the requirement i will do so. Thank you for your answer! Klaus
On Fri, May 06, 2005 at 09:25:35 +0200, Vortex <vortex25@gmx.de> wrote: > Hi!! > > Of course the following is possible in various ways but i wonder > if there is an elegant and simple solution: > > A part from apache log: > remote_host varchar(50) > request_uri varchar(50) > ts timestamp with time zone > > Now i want to see which host has done how many requests. This > is not difficult: > > SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host; > > But now i would like to add a column to the result which > shows the most recent requested uri for each group. > This sounds quite easy at first glance but i see no simple solution. > > I suppose with max(ts) i could acquire the time of the > most recent request within the group, but how do i get > the corresponding request_uri without doing subquerys or > something like that? You could use max(ts) as long as you can have two requests occur at the same time. This might be a reasonable assumption, but it might have been better to include a sequence so that each record would be guaranteed to have a unique key. Using DISTINCT ON to get the latest URIs is probably better. You can join that query to the one doing the count. This will probably be more efficient than executing a subquery for each remote host.
Hi create table abc ( remote_host varchar(50),request_uri varchar(50),ts timestamp ); SELECT abc.remote_host , c , abc.request_uri , a.t FROM abc , ( select remote_host , count(*) as c , max( ts ) as t fromabc group by remote_host ) as a where a.remote_host = abc.remote_host and abc.ts = a.t Regards, R.Muralidharan -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Vortex Sent: Friday, May 06, 2005 12:56 PM To: pgsql-sql@postgresql.org Subject: [SQL] select within aggregate? Hi!! Of course the following is possible in various ways but i wonder if there is an elegant and simple solution: A part from apache log: remote_host varchar(50) request_uri varchar(50) ts timestamp with time zone Now i want to see which host has done how many requests. This is not difficult: SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host; But now i would like to add a column to the result which shows the most recent requested uri for each group. This sounds quite easy at first glance but i see no simple solution. I suppose with max(ts) i could acquire the time of the most recent request within the group, but how do i get the corresponding request_uri without doing subquerys or something like that? Thank you very much! Klaus ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Fri, 6 May 2005 07:49:23 -0500 Bruno Wolff III <bruno@wolff.to> wrote: > You could use max(ts) as long as you can have two requests occur at the > same time. This might be a reasonable assumption, but it might have been > better to include a sequence so that each record would be guaranteed to > have a unique key. Yes, of course. So far it was more a theoretical question. > Using DISTINCT ON to get the latest URIs is probably better. You can > join that query to the one doing the count. This will probably be more > efficient than executing a subquery for each remote host. The DISTINCT ON construction looks quite interesting. It seems to allow some sorting within an group (even althought it's no real aggregate) in general. I can't say if it's really more eficient than the solution suggested by R.Muralidharan. But i will do some tests! Thank you very much! Klaus