Re: slow queries on large syslog table - Mailing list pgsql-general
From | colm ennis |
---|---|
Subject | Re: slow queries on large syslog table |
Date | |
Msg-id | DKEIIKIBPBFHLPDHMIMGEEACCIAA.colm.ennis@eircom.net Whole thread Raw |
In response to | Re: slow queries on large syslog table (wsheldah@lexmark.com) |
Responses |
Re: slow queries on large syslog table
|
List | pgsql-general |
hi wes, unfortunately the searches people do can be on any combination of date, hostid, and ciscomsgid. there are about 80 of each of these. also as people are usually using this when a problem is detected with a device, they will want to see the most recent message with regard to this so im not sure if caching in a secondary table would be practicable. i have thought about perhaps having a table for each host, and creating this automatically in the trigger for new hosts. this would take a while to develop though and the delays involved in creating a new table from an insert trigger scare me, i fear that we might be dropping syslogs while msyslog is waiting for this to complete. i would imagine that the queries might be speeded up by using some other indexes, well i hope! thanks for your help, colm ennis -----Original Message----- From: wsheldah@lexmark.com [mailto:wsheldah@lexmark.com] Sent: 13 December 2001 14:57 To: colm ennis Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] slow queries on large syslog table If these are your most common queries, and if they don't have to be up-to-the-minute, you might consider creating a separate table that just contains entries that meet your most common criteria, i.e. "WHERE shostid IN (23,3)", or possibly even more restrictive. Update that table daily or hourly with a cron job, index it, probably drop and recreate the indexes after a major update or every so often depending on how much it changes. That should reduce the amount of data the queries need to search, and the criteria you use to build this lookup table can be left out of the actual report queries. A separate query in the web app. could look at live data for those times when you need up-to-the-minute results enough to make it worth waiting for. Hope this helps, Wes Sheldahl "colm ennis" <colm.ennis%eircom.net@interlock.lexmark.com> on 12/13/2001 09:16:42 AM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] slow queries on large syslog table hi all, my selects using the config below are in some cases extremely slow, i would love if someone would point out ways to speed this up bit. note ive only recently started using postgresql, so feel free to point out the stupid mistakes ive made. im using a postgresql db to store cisco syslogs which are pumped in by msyslog. my postgresql is version 7.1.3, is running on freebsd 4.3 and is the standard ports install. my tables are : create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid integer, smsg text); create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128)); create table host_table (shostid serial,shost varchar(128)); ive created a trigger on insertion to syslog_table which extracts ciscomsg, and updates the other two tables. as you can appreciate my syslog table can be quite large and is constantly growing, it currently has about 1.7 million rows. both ciscomsg_table and host_table have ~ 80 rows. ive also created a web interface for selecting syslogs based on optional combinations of timestamp, hostname and ciscomsg. to speed queries i created some indexes on syslog_table : create index syslog_table_stimestamp_index on syslog_table (stimestamp); create index syslog_table_shostid_index on syslog_table (shostid); create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid); create index syslog_table_shostid_sciscomsgid_index on syslog_table (shostid,sciscomsgid); ive just performed a vacuum analyse for this mail, although i guess syslog insertions will have been blocked for some of the time this running, so im trying to avoid doing this. ok now for queries, row counts, performance and explains. query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp >= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY stimestamp DESC LIMIT 1000 matching messages - 7212 query time(s) - 39 explain - Limit (cost=0.00..34032.89 rows=1000 width=24) -> Index Scan Backward using syslog_table_stimestamp_index on syslog_table (cost=0.00..111846.03 rows=3286 width=24) query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp >= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 matching messages - 5 query time(s) - 79 explain - Limit (cost=75828.24..75828.24 rows=437 width=24) -> Sort (cost=75828.24..75828.24 rows=438 width=24) -> Seq Scan on syslog_table (cost=0.00..75809.05 rows=438 width=24) query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp >= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 matching messages - 9 query time(s) - 73 explain - Limit (cost=0.00..66287.05 rows=1000 width=24) -> Index Scan Backward using syslog_table_stimestamp_index on syslog_table (cost=0.00..421735.44 rows=6362 width=24) query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp >= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 matching messages - 19 query time(s) - 224 explain - Limit (cost=0.00..34831.93 rows=1000 width=24) -> Index Scan Backward using syslog_table_stimestamp_index on syslog_table (cost=0.00..581118.60 rows=16684 width=24) query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 total messages/matching messages - /19 query time(s) - 225 explain - Limit (cost=0.00..34559.46 rows=1000 width=24) -> Index Scan Backward using syslog_table_stimestamp_index on syslog_table (cost=0.00..577149.86 rows=16700 width=24) i dont really know enough to be able to interpret the explain output but clearly only the syslog_table_stimestamp_index index is being used. as you can see the query times are terrible and approach browser timeouts. please, does anyone have any ideas on how to speed this up? thanks in advance for any help i get, colm ennis ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-general by date: