count number of concurrent requests - Mailing list pgsql-general

From Ondrej Ivanič
Subject count number of concurrent requests
Date
Msg-id CAM6mieLP8GcGNCsZOFPrA2SEiZFNikcwC62DC1gA2T+Om5Kgvg@mail.gmail.com
Whole thread Raw
List pgsql-general
Hi,

I have the following table:

dwh=> \d events
                 Table "public.events"
      Column      |            Type             | Modifiers
------------------+-----------------------------+-----------
 datetime         | timestamp without time zone |
 request_duration | integer                     |

dwh=> select count(*) from events;
  count
----------
 82912116
(1 row)

Each row represents one event which started at 'datetime' and finished
at 'datetime +  request_duration'. I would like to know how many other
events started between 'datetime' and ''datetime +  request_duration'
(ie. concurrency).

I've started with this query (CTE + join):
with e as (
    select
        datetime as date_s,
        datetime + (request_duration::text || ' msec')::interval as date_e,
        request_duration
    from events
    where datetime < '2012-08-01 00:01:00'
)
select
    e1.date_s,
    e1.date_e,
    count(*) as "count",
    count(case when e1.request_duration > 1000 then true else null
end) as "over 1000"
from e as e1
left join e as e2 on (e2.date_s between e1.date_s and e1.date_e)
group by e1.date_s, e1.date_e
having count(case when e1.request_duration > 1000 then true else null end) > 0

which is incredibly slow (as expected) and I can not analyse more than
several minutes of real traffic. I need to run this query over few
days at least. Second try was this one:

select
    date_s,
    date_e,
    counts[1] as "count",
    counts[2] as "over 1000"
from (
    select
        datetime as date_s,
        datetime + (request_duration::text || ' msec')::interval as date_e,
        (
            select
                array[
                    count(*),
                    count(case when ee.request_duration > 1000 then
true else null end)
                ]
            from events ee
            where ee.datetime < '2012-08-01 00:01:00'
                and ee.datetime >= e.datetime
                and ee.datetime <= e.datetime +
(e.request_duration::text || ' msec')::interval
        ) as counts
    from events as e
    where datetime < '2012-08-01 00:01:00'
) as x
where counts[2] > 0

Which is much better (like few minutes per day) but I can not use any
of those queries because:
- events table is partitioned so the first one is not "going to work"
at all and second could be fixed by adding date ranges to "select ...
from events ee" query
- and I would like to have one query across two database platform but
other one doesn't support "correlated subquery"

Hence I've written simple PHP script which looks like "Window
Function". I tried to rewrite query using window function(s) but I
can't get my head around it.

Any ideas? I'm looking for something which is relatively fast and
doesn't use "correlated subquery"...

Thanks!

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)


pgsql-general by date:

Previous
From: Chris Travers
Date:
Subject: Re: Schemas vs partitioning vs multiple databases for archiving
Next
From: Wolfgang Keller
Date:
Subject: Re: Messy data models (Re: Visualize database schema)