Thread: select within aggregate?

select within aggregate?

From
Vortex
Date:
Hi!!

Of course the following is possible in various ways but i wonder
if there is an elegant and simple solution:

A part from apache log:
remote_host    varchar(50)
request_uri    varchar(50)
ts             timestamp with time zone

Now i want to see which host has done how many requests. This
is not difficult:

SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host;

But now i would like to add a column to the result which
shows the most recent requested uri for each group.
This sounds quite easy at first glance but i see no simple solution.

I suppose with max(ts) i could acquire the time of the
most recent request within the group, but how do i get
the corresponding request_uri without doing subquerys or
something like that?

Thank you very much!

Klaus


Re: select within aggregate?

From
Vortex
Date:
On Fri, 6 May 2005 13:34:26 +0530
"Ramakrishnan Muralidharan" <ramakrishnanm@pervasive-postgres.com> wrote:

> SELECT abc.remote_host ,  c , abc.request_uri , a.t FROM abc , ( select remote_host , count(*) as c , max( ts ) as t
fromabc group by remote_host ) as a 
 
> where a.remote_host = abc.remote_host and abc.ts = a.t

I thought about this too. But what happens if there are accidentally two
sets with same timestamp in the table? I suppose the outer select would
match both of them. But a DISTINCT should help to avoid this.
Ok, if you think this is the propper way to meet the
requirement i will do so.

Thank you for your answer!

Klaus



Re: select within aggregate?

From
Bruno Wolff III
Date:
On Fri, May 06, 2005 at 09:25:35 +0200, Vortex <vortex25@gmx.de> wrote:
> Hi!!
> 
> Of course the following is possible in various ways but i wonder
> if there is an elegant and simple solution:
> 
> A part from apache log:
> remote_host    varchar(50)
> request_uri    varchar(50)
> ts             timestamp with time zone
> 
> Now i want to see which host has done how many requests. This
> is not difficult:
> 
> SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host;
> 
> But now i would like to add a column to the result which
> shows the most recent requested uri for each group.
> This sounds quite easy at first glance but i see no simple solution.
> 
> I suppose with max(ts) i could acquire the time of the
> most recent request within the group, but how do i get
> the corresponding request_uri without doing subquerys or
> something like that?

You could use max(ts) as long as you can have two requests occur at the
same time. This might be a reasonable assumption, but it might have been
better to include a sequence so that each record would be guaranteed to
have a unique key.

Using DISTINCT ON to get the latest URIs is probably better. You can
join that query to the one doing the count. This will probably be more
efficient than executing a subquery for each remote host.


Re: select within aggregate?

From
"Ramakrishnan Muralidharan"
Date:
Hi

create table abc
(
remote_host    varchar(50),request_uri    varchar(50),ts timestamp
);

SELECT abc.remote_host ,  c , abc.request_uri , a.t FROM abc , ( select remote_host , count(*) as c , max( ts ) as t
fromabc group by remote_host ) as a  
where a.remote_host = abc.remote_host and abc.ts = a.t

Regards,
R.Muralidharan

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Vortex
Sent: Friday, May 06, 2005 12:56 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] select within aggregate?


Hi!!

Of course the following is possible in various ways but i wonder
if there is an elegant and simple solution:

A part from apache log:
remote_host    varchar(50)
request_uri    varchar(50)
ts             timestamp with time zone

Now i want to see which host has done how many requests. This
is not difficult:

SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host;

But now i would like to add a column to the result which
shows the most recent requested uri for each group.
This sounds quite easy at first glance but i see no simple solution.

I suppose with max(ts) i could acquire the time of the
most recent request within the group, but how do i get
the corresponding request_uri without doing subquerys or
something like that?

Thank you very much!

Klaus

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)


Re: select within aggregate?

From
Vortex
Date:
On Fri, 6 May 2005 07:49:23 -0500
Bruno Wolff III <bruno@wolff.to> wrote:

> You could use max(ts) as long as you can have two requests occur at the
> same time. This might be a reasonable assumption, but it might have been
> better to include a sequence so that each record would be guaranteed to
> have a unique key.
Yes, of course. So far it was more a theoretical question.

> Using DISTINCT ON to get the latest URIs is probably better. You can
> join that query to the one doing the count. This will probably be more
> efficient than executing a subquery for each remote host.
The DISTINCT ON construction looks quite interesting. It seems to allow
some sorting within an group (even althought it's no real aggregate) in
general. I can't say if it's really more eficient than the solution suggested
by R.Muralidharan. But i will do some tests!

Thank you very much!

Klaus