conceptual method to create high performance query involving time - Mailing list pgsql-performance
From | Don Bowman |
---|---|
Subject | conceptual method to create high performance query involving time |
Date | |
Msg-id | FE045D4D9F7AED4CBFF1B3B813C85337045D81D5@mail.sandvine.com Whole thread Raw |
List | pgsql-performance |
I have a table which has e.g. CREATE TABLE portstats ( id serial, logtime TIMESTAMP, cluster VARCHAR(40), element VARCHAR(40), port INT, rxOctets BIGINT, txOctets BIGINT ); which is used for logging statistics from network equipment. cluster is like the location. rxOctets, txOctets are numbers which increase over time. Now, i would like to generate a chart which shows the bitrate. So i need subtract rxOctets from a previous value and divide by the time range. To be efficient, and avoid fetching too many points, i want the interval between points I select to be a function of the time range. E.g., when I'm doing a 1-day chart, i would like to select points that are 15min apart. When I'm doing a 1-yr query, I would like to select points that are e.g. 4hours apart. I can make this determination in the script that generates the statement. The problem i'm having is that this is a) a very slow operation b) selects all data points on t1, and then the interval apart one on t2... so i still end up with too many points. points are logged every ~5 minutes, but there is some small variation on the interval (and some observations might be missing due to eg communication loss to db). [ a process goes along later and decimates out points as they age to prevent the db from becoming very large]. The query I have is below. The question is ... what is the best strategy for an operation of this nature? SELECT t1.port, t1.logtime AS start, t2.logtime AS end, t1.cluster, t1.element, (8.0 * (t2.rxoctets - t1.rxoctets) / (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS rxbps, (8.0 * (t2.txoctets - t1.txoctets) / (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS txbps FROM portstats t1 INNER JOIN portstats t2 ON t2.cluster = t1.cluster AND t2.element = t1.element AND t2.port = t1.port AND t2.logtime = (SELECT logtime FROM portstats t3 WHERE t3.cluster = t1.cluster AND t3.element = t1.element AND t3.port = t1.port AND t3.logtime > t1.logtime + '00:15:00' ORDER BY cluster ASC, element ASC, port ASC, logtime ASC LIMIT 1) WHERE t1.cluster = 'somecluster' AND (t1.element = 'somelement') AND (t1.logtime BETWEEN '2004-01-07 00:00' AND '2004-02-08 00:00') ORDER BY t1.cluster ASC, t1.element ASC, t1.port ASC, t1.logtime ASC ; The query plan for 1 week is below, this takes ~2s to operate. It gets very slow for 1yr. Sort (cost=14055.35..14067.74 rows=4956 width=176) (actual time=1523.956..1538.354 rows=5943 loops=1) Sort Key: t1.svcluster, t1.element, t1.port, t1.logtime -> Merge Join (cost=2304.49..13751.18 rows=4956 width=176) (actual time=1008.620..1329.766 rows=5943 loops=1) Merge Cond: (("outer"."?column10?" = "inner".logtime) AND ("outer".port = "inner".port)) -> Sort (cost=977.39..992.25 rows=5944 width=136) (actual time=678.564..692.974 rows=5943 loops=1) Sort Key: (subplan), t1.port -> Index Scan using portstats_element_idx on portstats t1 (cost=0.00..604.78 rows=5944 width=136) (actual time=0.191..581.311 ro ws=5943 loops=1) Index Cond: (element = 'my-element.mydomain.net'::bpchar) Filter: ((svcluster = 'my-cluster'::bpchar) AND (logtime >= '2004-01-07 00:00:00-05'::timestamp with time zone) AND (logtime <= '2004-02-08 00:00:00-05'::timestamp with time zone)) SubPlan -> Limit (cost=0.00..0.62 rows=1 width=104) (actual time=0.064..0.066 rows=1 loops=5943) -> Index Scan using www6 on portstats t3 (cost=0.00..399.28 rows=643 width=104) (actual time=0.054..0.054 rows=1 l oops=5943) Index Cond: ((svcluster = $1) AND (element = $2) AND (port = $3) AND (logtime > ($4 + '00:15:00'::interval))) -> Sort (cost=1327.10..1356.00 rows=11560 width=136) (actual time=289.168..321.522 rows=11771 loops=1) Sort Key: t2.logtime, t2.port -> Index Scan using portstats_element_idx on portstats t2 (cost=0.00..546.98 rows=11560 width=136) (actual time=0.103..192.027 r ows=11560 loops=1) Index Cond: ('my-element.mydomain.net'::bpchar = element) Filter: (('my-cluster'::bpchar = svcluster)) Total runtime: 1609.411 ms (19 rows)
pgsql-performance by date: