Thread: Calculating 95th percentiles

Calculating 95th percentiles

From
Landreville
Date:
This is not a performance bug -- my query takes a reasonably long
amount of time, but I would like to see if I can get this calculation
any faster in my setup.

I have a table:
volume_id serial primary key
switchport_id integer not null
in_octets bigint not null
out_octets bigint not null
insert_timestamp timestamp default now()
with indexes on volume_id, switchport_id, insert_timestamp.

That is partitioned into about 3000 tables by the switchport_id (FK to
a lookup table), each table has about 30 000 rows currently (a row is
inserted every 5 minutes into each table).
I have select queries that filter based on switchport_id and
timestamp. Constraint exclusion is used with the switchport_id to get
the right table and the insert_timestamp has an index on it (on each
table).

Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes.  I
have a function to do this calcuation:


create or replace function traffic.get_delta_table(p_switchport_id integer,
                                p_start_date date, p_end_date date)
returns table(  volume_id integer,
                insert_timestamp timestamp,
                out_delta bigint,
                out_rate bigint,
                out_rate_order bigint,
                in_delta bigint,
                in_rate bigint,
                in_rate_order bigint)
as $$
declare
begin
    -- we need to force pgsql to make a new plan for each query so it can
    -- use constraint exclusions on switchport id to determine the
partition table to scan
    return query execute 'select
        t.volume_id,
        t.insert_timestamp,
        t.out_delta,
        t.out_delta * 8 / t.time_difference as out_rate,
        row_number() over (order by t.out_delta * 8 /
t.time_difference) as out_rate_order,
        t.in_delta,
        t.in_delta * 8 / t.time_difference as in_rate,
        row_number() over(order by t.in_delta * 8 / t.time_difference)
as in_rate_order
    from
    (select
        n.volume_id,
        n.insert_timestamp,
        extract(epoch from (n.insert_timestamp -
lag(n.insert_timestamp,1,n.insert_timestamp) over(order by
n.insert_timestamp)))::integer as time_difference,
        case
            when n.out_octets < lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
            then n.out_octets
            else n.out_octets - lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
        end as out_delta,
        case
            when n.in_octets < lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
            then n.in_octets
            else n.in_octets - lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
        end as in_delta
        from volume as n
        where n.insert_timestamp between $1 and $2
        and n.switchport_id = '||p_switchport_id||'
        and in_octets != 0
        and out_octets != 0
    ) as t
    where time_difference is not null and time_difference != 0' using
p_start_date, p_end_date;

end; $$ language plpgsql;

The query inside the function's plan:
 WindowAgg  (cost=2269.62..2445.35 rows=6390 width=32) (actual
time=7526.526..7531.855 rows=6622 loops=1)
   ->  Sort  (cost=2269.62..2285.60 rows=6390 width=32) (actual
time=7526.497..7527.924 rows=6622 loops=1)
         Sort Key: (((t.in_delta * 8) / t.time_difference))
         Sort Method:  external sort  Disk: 432kB
         ->  WindowAgg  (cost=1753.90..1865.72 rows=6390 width=32)
(actual time=2613.593..2618.727 rows=6622 loops=1)
               ->  Sort  (cost=1753.90..1769.87 rows=6390 width=32)
(actual time=2613.566..2614.550 rows=6622 loops=1)
                     Sort Key: (((t.out_delta * 8) / t.time_difference))
                     Sort Method:  quicksort  Memory: 710kB
                     ->  Subquery Scan on t  (cost=978.89..1350.00
rows=6390 width=32) (actual time=2582.254..2606.708 rows=6622 loops=1)
                           Filter: ((t.time_difference IS NOT NULL)
AND (t.time_difference <> 0))
                           ->  WindowAgg  (cost=978.89..1269.32
rows=6454 width=28) (actual time=2582.243..2596.546 rows=6623 loops=1)
                                 ->  Sort  (cost=978.89..995.03
rows=6454 width=28) (actual time=2582.120..2583.172 rows=6623 loops=1)
                                       Sort Key: n.insert_timestamp
                                       Sort Method:  quicksort  Memory: 710kB
                                       ->  Result  (cost=8.87..570.49
rows=6454 width=28) (actual time=1036.720..2576.755 rows=6623 loops=1)
                                             ->  Append
(cost=8.87..570.49 rows=6454 width=28) (actual time=1036.718..2574.719
rows=6623 loops=1)
                                                   ->  Bitmap Heap
Scan on volume n  (cost=8.87..12.90 rows=1 width=28) (actual
time=0.055..0.055 rows=0 loops=1)
                                                         Recheck Cond:
((switchport_id = 114) AND (insert_timestamp >= '2011-02-01
00:00:00'::timestamp without time zone) AND (insert_timestamp <=
'2011-03-02 00:00:00'::timestamp without time zone))
                                                         Filter:
((in_octets <> 0) AND (out_octets <> 0))
                                                         ->  BitmapAnd
 (cost=8.87..8.87 rows=1 width=0) (actual time=0.053..0.053 rows=0
loops=1)
                                                               ->
Bitmap Index Scan on volume_parent_switchport_id_idx  (cost=0.00..4.30
rows=7 width=0) (actual time=0.045..0.045 rows=0 loops=1)

Index Cond: (switchport_id = 114)
                                                               ->
Bitmap Index Scan on volume_parent_insert_timestamp_idx
(cost=0.00..4.32 rows=7 width=0) (never executed)

Index Cond: ((insert_timestamp >= '2011-02-01 00:00:00'::timestamp
without time zone) AND (insert_timestamp <= '2011-03-02
00:00:00'::timestamp without time zone))
                                                   ->  Bitmap Heap
Scan on volume_114 n  (cost=142.40..557.59 rows=6453 width=28) (actual
time=1036.662..2573.116 rows=6623 loops=1)
                                                         Recheck Cond:
((insert_timestamp >= '2011-02-01 00:00:00'::timestamp without time
zone) AND (insert_timestamp <= '2011-03-02 00:00:00'::timestamp
without time zone))
                                                         Filter:
((in_octets <> 0) AND (out_octets <> 0) AND (switchport_id = 114))
                                                         ->  Bitmap
Index Scan on volume_114_insert_timestamp_idx  (cost=0.00..140.78
rows=6453 width=0) (actual time=981.034..981.034 rows=6623 loops=1)
                                                               Index
Cond: ((insert_timestamp >= '2011-02-01 00:00:00'::timestamp without
time zone) AND (insert_timestamp <= '2011-03-02 00:00:00'::timestamp
without time zone))
 Total runtime: 7567.261 ms

This ends up taking anywhere from 300ms to 7000ms (usually its around
300-400ms) and returns about 8000 rows.

To get the 95th percentile its a couple simple selects after putting
the result set from the above function into a temporary table:
    create temporary table deltas on commit drop as
        select * from get_delta_table(p_switchport_id, p_start_date,
p_end_date);

    select round(count(volume_id) * 0.95) into v_95th_row from deltas;
    select in_rate into v_record.in_95th from deltas where
in_rate_order = v_95th_row;
    select out_rate into v_record.out_95th from deltas where
out_rate_order = v_95th_row;
    select sum(in_delta), sum(out_delta) into v_record.in_total,
v_record.out_total from deltas;

Unfortunately using a temporary table means that I cannot run this
query on the read-only slave, but I can't see a way around using one.
The master has 3000 inserts running against it every 5 minutes --
which used to be every 1 minute but the space and time for calculating
5x the current number of rows was not worth it.

My server has 1GB of memory is running Red Hat and the only daemon is Postgres:
effective cache size is 768MB
shared buffers are 256MB
work_mem is 2MB (I changed it when the explain analyze was showing
1.5MB used for an on disk sort)
max locks per transaction is 3000 (I changed it when I started getting
the error that suggests I change the max locks per transaction)


Any ideas on speeding this up?

Thanks,

Landreville

Re: Calculating 95th percentiles

From
"Pierre C"
Date:
> Any time the volume tables are queried it is to calculate the deltas
> between each in_octets and out_octets from the previous row (ordered
> by timestamp). The delta is used because the external system, where
> the data is retrieved from, will roll over the value sometimes.  I
> have a function to do this calcuation:

Would it be possible to do this when inserting and store the deltas
directly ?

Re: Calculating 95th percentiles

From
marcin mank
Date:
On Fri, Mar 4, 2011 at 4:18 PM, Landreville
<landreville@deadtreepages.com> wrote:

>    create temporary table deltas on commit drop as
>        select * from get_delta_table(p_switchport_id, p_start_date,
> p_end_date);
>
>    select round(count(volume_id) * 0.95) into v_95th_row from deltas;
>    select in_rate into v_record.in_95th from deltas where
> in_rate_order = v_95th_row;
>    select out_rate into v_record.out_95th from deltas where
> out_rate_order = v_95th_row;
>    select sum(in_delta), sum(out_delta) into v_record.in_total,
> v_record.out_total from deltas;
>
> Unfortunately using a temporary table means that I cannot run this
> query on the read-only slave, but I can't see a way around using one.

Is this fast enough on a slave:


with deltas as (select * from get_delta_table(...)),
p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
select
(select in_rate from deltas, p95 where
in_rate_order = p95v),
(select out_rate from deltas, p95 where
out_rate_order = p95v)
etc..

?

Greetings
Marcin

Re: Calculating 95th percentiles

From
Landreville
Date:
On Sat, Mar 5, 2011 at 7:34 PM, marcin mank <marcin.mank@gmail.com> wrote:
> Is this fast enough on a slave:
>
>
> with deltas as (select * from get_delta_table(...)),
> p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
> select
> (select in_rate from deltas, p95 where
> in_rate_order = p95v),
> (select out_rate from deltas, p95 where
> out_rate_order = p95v)
> etc..
> Greetings
> Marcin
>

I really didn't know you could use a with statement on a read-only
database -- I don't think I even knew the with statement existed in
Postgres (is it documented somewhere?). I will try this out.

I am also looking at Pierre's suggestion of calculating the delta
value on insert. To do this I am going to update all the rows
currently in the partitioned tables. Does anyone know if this will
still use constraint exclusion in the correlated subquery or will it
scan every partitioned table for each updated row?:

update volume
    set in_delta =  in_octets - vprev.in_octets,
        out_delta =  out_octets - vprev.out_octets
from volume vprev
where vprev.insert_timestamp =
(select max(insert_timestamp) from volume  v
 where v.switch_port_id = volume.switchport_id
 and    v.insert_timestamp < volume.insert_timestamp);

I suppose I can check with an analyze before I execute it (I still
have to alter the table to add the delta columns).

Thanks,

Landreville