Thread: Querying a Large Partitioned DB
Team Amazing,
I am building a massive database for storing the syslogs of a room of servers. The database gets about 25 million entries a day, and need to keep them for 180 days. So the total size of the database will be about 4.5 billion records.
I need to be able to do full text searches on the message field, and of course, it needs to be reasonably fast.
The table is partitioned daily and has this structure:
syslog=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------
I am building a massive database for storing the syslogs of a room of servers. The database gets about 25 million entries a day, and need to keep them for 180 days. So the total size of the database will be about 4.5 billion records.
I need to be able to do full text searches on the message field, and of course, it needs to be reasonably fast.
The table is partitioned daily and has this structure:
syslog=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------
+-------+-------
public | systemevents | table | pgsql
public | systemevents_032909 | table | pgsql
public | systemevents_033009 | table | pgsql
public | systemevents_033109 | table | pgsql
public | systemevents_040109 | table | pgsql
public | systemevents_040209 | table | pgsql
public | systemevents_040309 | table | pgsql
public | systemevents_040409 | table | pgsql
public | systemevents_040509 | table | pgsql
public | systemevents_040609 | table | pgsql
public | systemevents_040709 | table | pgsql
public | systemevents_040909 | table | pgsql
public | systemevents_041009 | table | pgsql
(13 rows)
syslog=# \d systemevents
Table "public.systemevents"
Column | Type | Modifiers
--------------------+-----------------------------+-----------
message | character varying |
facility | integer |
fromhost | character varying(80) |
priority | integer |
devicereportedtime | timestamp without time zone |
receivedat | timestamp without time zone |
infounitid | integer |
syslogtag | character varying(80) |
message_index_col | tsvector |
Rules:
systemevents_insert_032909 AS
ON INSERT TO systemevents
WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032909 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col)
...... [there are rules like that for each partition]
My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english', 'Term') LIMIT 25 OFFSET 0;
Here is an explain analyze:
Limit (cost=2422393.00..2422393.06 rows=25 width=153) (actual time=93363.496..93363.610 rows=25 loops=1)
-> Sort (cost=2422393.00..2422933.05 rows=216019 width=153) (actual time=93363.490..93363.532 rows=25 loops=1)
Sort Key: public.systemevents.devicereportedtime
Sort Method: top-N heapsort Memory: 22kB
-> Result (cost=0.00..2416297.10 rows=216019 width=153) (actual time=20567.267..93362.574 rows=163 loops=1)
-> Append (cost=0.00..2415217.01 rows=216019 width=153) (actual time=20567.244..93361.582 rows=163 loops=1)
-> Seq Scan on systemevents (cost=0.00..1750240.39 rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1)
Filter: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040309 systemevents (cost=1168.86..54860.45 rows=15253 width=152) (actual time=82.429..275.589 rows=20 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040309 (cost=0.00..1165.04 rows=15253 width=0) (actual time=50.029..50.029 rows=20 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040409 systemevents (cost=1038.56..52300.49 rows=14601 width=147) (actual time=68.006..68.006 rows=0 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040409 (cost=0.00..1034.91 rows=14601 width=0) (actual time=67.999..67.999 rows=0 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040509 systemevents (cost=1055.06..52482.72 rows=14644 width=150) (actual time=63.257..63.257 rows=0 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040509 (cost=0.00..1051.40 rows=14644 width=0) (actual time=63.251..63.251 rows=0 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040609 systemevents (cost=1842.50..88135.00 rows=24506 width=152) (actual time=117.747..355.043 rows=34 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040609 (cost=0.00..1836.37 rows=24506 width=0) (actual time=92.079..92.079 rows=34loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040709 systemevents (cost=1844.72..89127.11 rows=24790 width=152) (actual time=114.387..262.360 rows=24 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040709 (cost=0.00..1838.52 rows=24790 width=0) (actual time=84.848..84.848 rows=24loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_032909 systemevents (cost=320.54..17254.18 rows=4841 width=142) (actual time=67.808..67.810 rows=1 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_032909 (cost=0.00..319.33 rows=4841 width=0) (actual time=56.044..56.044 rows=1 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033009 systemevents (cost=1556.24..75179.65 rows=20931 width=149) (actual time=77.644..335.360 rows=43 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_033009 (cost=0.00..1551.01 rows=20931 width=0) (actual time=72.454..72.454 rows=43loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033109 systemevents (cost=1892.97..92637.60 rows=25806 width=149) (actual time=86.468..86.856 rows=4 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_033109 (cost=0.00..1886.52 rows=25806 width=0) (actual time=70.397..70.397 rows=4 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040109 systemevents (cost=1395.47..66260.67 rows=18430 width=149) (actual time=85.711..177.369 rows=12 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040109 (cost=0.00..1390.86 rows=18430 width=0) (actual time=67.481..67.481 rows=12 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040209 systemevents (cost=1619.92..76738.75 rows=21326 width=150) (actual time=89.065..89.067 rows=1 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040209 (cost=0.00..1614.58 rows=21326 width=0) (actual time=73.229..73.229 rows=1 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
Total runtime: 93364.070 ms
(49 rows)
Can you give me any tips and suggestions about how to speed this up?
It seems like a smart query planner would understand the rules and know that it should search in the last partitions first, since it is ordering by device reported time.
Thanks!
justin
public | systemevents | table | pgsql
public | systemevents_032909 | table | pgsql
public | systemevents_033009 | table | pgsql
public | systemevents_033109 | table | pgsql
public | systemevents_040109 | table | pgsql
public | systemevents_040209 | table | pgsql
public | systemevents_040309 | table | pgsql
public | systemevents_040409 | table | pgsql
public | systemevents_040509 | table | pgsql
public | systemevents_040609 | table | pgsql
public | systemevents_040709 | table | pgsql
public | systemevents_040909 | table | pgsql
public | systemevents_041009 | table | pgsql
(13 rows)
syslog=# \d systemevents
Table "public.systemevents"
Column | Type | Modifiers
--------------------+-----------------------------+-----------
message | character varying |
facility | integer |
fromhost | character varying(80) |
priority | integer |
devicereportedtime | timestamp without time zone |
receivedat | timestamp without time zone |
infounitid | integer |
syslogtag | character varying(80) |
message_index_col | tsvector |
Rules:
systemevents_insert_032909 AS
ON INSERT TO systemevents
WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032909 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col)
...... [there are rules like that for each partition]
My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english', 'Term') LIMIT 25 OFFSET 0;
Here is an explain analyze:
Limit (cost=2422393.00..2422393.06 rows=25 width=153) (actual time=93363.496..93363.610 rows=25 loops=1)
-> Sort (cost=2422393.00..2422933.05 rows=216019 width=153) (actual time=93363.490..93363.532 rows=25 loops=1)
Sort Key: public.systemevents.devicereportedtime
Sort Method: top-N heapsort Memory: 22kB
-> Result (cost=0.00..2416297.10 rows=216019 width=153) (actual time=20567.267..93362.574 rows=163 loops=1)
-> Append (cost=0.00..2415217.01 rows=216019 width=153) (actual time=20567.244..93361.582 rows=163 loops=1)
-> Seq Scan on systemevents (cost=0.00..1750240.39 rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1)
Filter: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040309 systemevents (cost=1168.86..54860.45 rows=15253 width=152) (actual time=82.429..275.589 rows=20 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040309 (cost=0.00..1165.04 rows=15253 width=0) (actual time=50.029..50.029 rows=20 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040409 systemevents (cost=1038.56..52300.49 rows=14601 width=147) (actual time=68.006..68.006 rows=0 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040409 (cost=0.00..1034.91 rows=14601 width=0) (actual time=67.999..67.999 rows=0 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040509 systemevents (cost=1055.06..52482.72 rows=14644 width=150) (actual time=63.257..63.257 rows=0 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040509 (cost=0.00..1051.40 rows=14644 width=0) (actual time=63.251..63.251 rows=0 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040609 systemevents (cost=1842.50..88135.00 rows=24506 width=152) (actual time=117.747..355.043 rows=34 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040609 (cost=0.00..1836.37 rows=24506 width=0) (actual time=92.079..92.079 rows=34loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040709 systemevents (cost=1844.72..89127.11 rows=24790 width=152) (actual time=114.387..262.360 rows=24 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040709 (cost=0.00..1838.52 rows=24790 width=0) (actual time=84.848..84.848 rows=24loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_032909 systemevents (cost=320.54..17254.18 rows=4841 width=142) (actual time=67.808..67.810 rows=1 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_032909 (cost=0.00..319.33 rows=4841 width=0) (actual time=56.044..56.044 rows=1 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033009 systemevents (cost=1556.24..75179.65 rows=20931 width=149) (actual time=77.644..335.360 rows=43 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_033009 (cost=0.00..1551.01 rows=20931 width=0) (actual time=72.454..72.454 rows=43loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033109 systemevents (cost=1892.97..92637.60 rows=25806 width=149) (actual time=86.468..86.856 rows=4 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_033109 (cost=0.00..1886.52 rows=25806 width=0) (actual time=70.397..70.397 rows=4 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040109 systemevents (cost=1395.47..66260.67 rows=18430 width=149) (actual time=85.711..177.369 rows=12 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040109 (cost=0.00..1390.86 rows=18430 width=0) (actual time=67.481..67.481 rows=12 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040209 systemevents (cost=1619.92..76738.75 rows=21326 width=150) (actual time=89.065..89.067 rows=1 loops=1)
Recheck Cond: (message_index_col @@ '''funkju'''::tsquery)
-> Bitmap Index Scan on systemevents_msg_idx_040209 (cost=0.00..1614.58 rows=21326 width=0) (actual time=73.229..73.229 rows=1 loops=1)
Index Cond: (message_index_col @@ '''funkju'''::tsquery)
Total runtime: 93364.070 ms
(49 rows)
Can you give me any tips and suggestions about how to speed this up?
It seems like a smart query planner would understand the rules and know that it should search in the last partitions first, since it is ordering by device reported time.
Thanks!
justin
Justin Funk <funkju@iastate.edu> writes: > Can you give me any tips and suggestions about how to speed this up? Use fewer partitions --- 180 is a lot. Maybe weekly partitioning would be about right. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 10, 2009, at 10:15 AM, Justin Funk wrote: > Team Amazing, > > I am building a massive database for storing the syslogs of a room of > servers. The database gets about 25 million entries a day, and need > to keep > them for 180 days. So the total size of the database will be about 4.5 > billion records. > > I need to be able to do full text searches on the message field, and > of > course, it needs to be reasonably fast. You could use pg-pool II or your own middleware to execute the search query in parallel across all the partitions (maybe not all 180 at once, though). Cheers, M -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) iEYEARECAAYFAknfaVAACgkQqVAj6JpR7t65DQCgsN51pMWoY8WXyxss6cXRPHug 4h8An2IufbKuhrw4fyki4gBbjrkkQD0M =5PRb -----END PGP SIGNATURE-----
On Fri, 2009-04-10 at 09:15 -0500, Justin Funk wrote: > I need to be able to do full text searches on the message field, and > of course, it needs to be reasonably fast. > > The table is partitioned daily and has this structure: > My typical query looks like this: > SELECT * FROM SystemEvents WHERE message_index_col @@ > to_tsquery('english', 'Term') LIMIT 25 OFFSET 0; The partitioning does nothing to improve your typical query. Loop through the tables from first to last until you have returned 25 records. That way you won't have to wait to search every table. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support