Performance issues with large amounts of time-series data - Mailing list pgsql-performance

From Hrishikesh (हृषीकेश मेहेंदळे)
Subject Performance issues with large amounts of time-series data
Date
Msg-id a7c00d4b0908261031n585aa9b0x630a2534a4afafcc@mail.gmail.com
Whole thread Raw
Responses Re: Performance issues with large amounts of time-series data  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Performance issues with large amounts of time-series data  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
Hi All,

We are improving our network appliance monitoring system, and are evaluating using PostgreSQL as the back-end traffic statistics database (we're currently running a home-grown Berkeley-DB based statistics database).

We log data from various network elements (it's mainly in/out bytes and packet counters, recorded for every port that we see traffic on). As such, the system  can expect to get data from 2000 devices (eventually, at the moment it's only about 250), and has a monitoring target of 100 ports (although this is not enforced at 100, in practice we've seen only about 20-30 ports in a given timeframe, and only about 50 distinct ports over a whole year of monitoring) -- this is akin to RRD (e.g. MRTG or Cacti) but with a lot more flexibility.

Our current monitoring system reports the data per device as

key = {device_id (uint64), identifier (uint32), sub_identifier (uint32), unix_time} (these four taken together are unique)
data = 4 x uint64 (BIGINT in PG tables)


My table structure in PG mirrors this format with a UNIQUE constraint across the four columns, and an index on each column separately. The data is received every 5 minutes, and stored at 5 minute, 1 hour and 1-day granularities into partitioned tables named like stats_300 -> (stats_300_begintime_endtime, stats_300_begintime_endtime) and so on. I have currently split the 5min tables at every 2 hours, 1 hour tables at  2 days, and 1-day tables at every month).

For this schema, the typical queries would be:

For timeseries graphs (graphed as bar/line graphs):
  SELECT TIMESTAMP, SUM(DATA_0), SUM(DATA_1), SUM(DATA_2), SUM(DATA_3)
    FROM <appropriate parent table> WHERE TIMESTAMP >= X AND TIMESTAMP < Y
    AND DEVICE IN (id1, id2, id3, ..... up to 2000 IDs can be here)
    GROUP BY TIMESTAMP;

For aggregate graphs (graphed as a pie chart):
  SELECT SUB_ID,  SUM(DATA_0), SUM(DATA_1), SUM(DATA_2), SUM(DATA_3)
    FROM <appropriate top table> WHERE TIMESTAMP >= X AND TIMESTAMP < Y
    AND DEVICE IN (id1, id2, id3, ..... up to 2000 IDs can be here)
    GROUP BY SUB_ID;

In my timing tests, the performance of PG is quite a lot worse than the equivalent BerkeleyDB implementation. Specifically, I get the following timing results:

For the longest-running queries:
BDB - 10-15 sec (cold transfer), <2 sec (warm - if I rerun the query immediately)
PG (command line) - 25 - 30 sec (cold), 25-30 sec (warm).
PG (via libpqxx) - ~40 sec (cold), 25-30 sec (warm)

The data is immutable once it goes in (unless I DROP TABLE), and I've VACUUM FULL ANALYZED the whole database *before* my timing queries.

An explain analyze looks like (the tables are prepopulated with data for 2000 devices and 100 sub_ids):

mydb=> explain analyze SELECT TIMESTAMP, SUM(DATA_0), SUM(DATA_1), SUM(DATA_2), SUM(DATA_3) FROM stats_3600 WHERE MAIN_ID = 1 AND SUB_ID = 0 AND TIMESTAMP >= 1251676859 AND TIMESTAMP <= 1251849659 GROUP BY TIMESTAMP;
                                                                                  QUERY PLAN                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=226659.20..226663.20 rows=200 width=36) (actual time=1709.651..1709.745 rows=48 loops=1)
   ->  Append  (cost=0.00..225288.47 rows=109659 width=36) (actual time=33.840..1264.328 rows=96000 loops=1)
         ->  Index Scan using uniq_3600 on stats_3600  (cost=0.00..8.28 rows=1 width=36) (actual time=0.019..0.019 rows=0 loops=1)
               Index Cond: ((main_id = 1) AND (sub_id = 0) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
         ->  Bitmap Heap Scan on stats_3600_1251590400_1251763199 stats_3600  (cost=2131.71..112946.75 rows=60642 width=36) (actual time=33.816..495.239 rows=46000 loops=1)
               Recheck Cond: ((main_id = 1) AND (sub_id = 0) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
               ->  Bitmap Index Scan on stats_3600_1251590400_1251763199_unique_check  (cost=0.00..2116.55 rows=60642 width=0) (actual time=21.415..21.415 rows=46000 loops=1)
                     Index Cond: ((main_id = 1) AND (sub_id = 0) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
         ->  Bitmap Heap Scan on stats_3600_1251763200_1251935999 stats_3600  (cost=1727.24..112333.44 rows=49016 width=36) (actual time=38.169..526.578 rows=50000 loops=1)
               Recheck Cond: ((main_id = 1) AND (sub_id = 0) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
               ->  Bitmap Index Scan on stats_3600_1251763200_1251935999_unique_check  (cost=0.00..1714.99 rows=49016 width=0) (actual time=24.059..24.059 rows=50000 loops=1)
                     Index Cond: ((main_id = 1) AND (sub_id = 0) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
 Total runtime: 1710.844 ms
(13 rows)


mydb=> explain analyze SELECT SUB_ID, SUM(DATA_0), SUM(DATA_1), SUM(DATA_2), SUM(DATA_3) FROM stats_3600 WHERE MAIN_ID = 1 AND TIMESTAMP >= 1251676859 AND TIMESTAMP <= 1251849659  GROUP BY SUB_ID;
                                                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=881887.53..881891.53 rows=200 width=36) (actual time=82007.298..82007.493 rows=99 loops=1)
   ->  Append  (cost=0.00..771583.84 rows=8824295 width=36) (actual time=37.206..42504.106 rows=8819844 loops=1)
         ->  Index Scan using uniq_3600 on stats_3600  (cost=0.00..8.32 rows=1 width=36) (actual time=0.024..0.024 rows=0 loops=1)
               Index Cond: ((main_id = 1) AND ("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
         ->  Index Scan using idx_ts_stats_3600_1251590400_1251763199 on stats_3600_1251590400_1251763199 stats_3600  (cost=0.00..369424.65 rows=4234747 width=36) (actual time=37.178..9776.530 rows=4226249 loops=1)
               Index Cond: (("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
               Filter: (main_id = 1)
         ->  Index Scan using idx_ts_stats_3600_1251763200_1251935999 on stats_3600_1251763200_1251935999 stats_3600  (cost=0.00..402150.87 rows=4589547 width=36) (actual time=0.119..11339.277 rows=4593595 loops=1)
               Index Cond: (("timestamp" >= 1251676859) AND ("timestamp" <= 1251849659))
               Filter: (main_id = 1)
 Total runtime: 82007.762 ms

The corresponding table definition looks like:
mydb=> \d stats_3600_1251590400_1251763199
Table "public.stats_3600_1251590400_1251763199"
   Column    |  Type   | Modifiers
-------------+---------+-----------
 main_id    | integer |
 sub_id | integer |
 timestamp   | integer |
 device      | bigint  |
 data_0      | bigint  |
 data_1      | bigint  |
 data_2      | bigint  |
 data_3      | bigint  |
Indexes:
    "stats_3600_1251590400_1251763199_unique_check" UNIQUE, btree (main_id, sub_id, "timestamp", device)
    "idx_cid_stats_3600_1251590400_1251763199" btree (main_id)
    "idx_scid_stats_3600_1251590400_1251763199" btree (sub_id)
    "idx_dev_stats_3600_1251590400_1251763199" btree (device)
    "idx_ts_stats_3600_1251590400_1251763199" btree ("timestamp")
Check constraints:
    "stats_3600_1251590400_1251763199_timestamp_check" CHECK ("timestamp" >= 1251590400 AND "timestamp" <= 1251763199)
Inherits: stats_3600

The table contains the following data (other tables are similar):
mydb=> select relname, relpages, reltuples from pg_class where relname like 'stats_%';
                    relname                     | relpages |  reltuples 
------------------------------------------------+----------+-------------
 stats_300_1251705600_1251712799                |    49532 |  4.8046e+06
  stats_3600_1251763200_1251935999               |   181861 | 1.76404e+07
  stats_86400_1244160000_1246751999              |    61845 | 5.99888e+06
[the rest truncated for brevity]


So my questions are:
1. Is there anything I can do to speed up performance for the queries? Even a warm performance comparable to the BDB version would be a big improvement from the current numbers.
2. Does the order in which data was received vs. data being queried matter? (If so, I can either cache the data before writing to DB, or rewrite the table when I rollover to the next one)


System Configuration:
 - 64-bit quad-core Xeon with 6 GB RAM
 - 4x250 GB SATA disks configured as RAID stripe+mirror
 - Linux 2.6.9-34 with some custom patches (CentOS 4.2 based)
 - postgres 8.3.7 (from sources, no special config options, installed to /var/opt/pgsql-8.3)
 - C++ interface using libpqxx-3.0 (also built from sources)Relevant parameters from postgresql.conf:
 - Relevant postgresql.conf parameters:
     data_directory = /data/pg (400 GB partition)
     max_connections = 8
     shared_buffers = 128MB
     work_mem = 256MB
     maintenance_work_mem=64MB
     effective_cache_size = 2048MB
     max_fsm_pages=204800
     default_statistics_target = 100
     constraint_exclusion = on


Thanks Much!
Hrishi

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Next
From: Tom Lane
Date:
Subject: Re: Performance issues with large amounts of time-series data