Re: slow queries on large syslog table - Mailing list pgsql-general

From colm ennis
Subject Re: slow queries on large syslog table
Date
Msg-id DKEIIKIBPBFHLPDHMIMGEEACCIAA.colm.ennis@eircom.net
Whole thread Raw
In response to Re: slow queries on large syslog table  (wsheldah@lexmark.com)
Responses Re: slow queries on large syslog table
List pgsql-general
hi wes,

unfortunately the searches people do can be on any combination of date,
hostid, and ciscomsgid. there are about 80 of each of these.

also as people are usually using this when a problem is detected with a
device, they will want to see the most recent message with regard to this
so im not sure if caching in a secondary table would be practicable.

i have thought about perhaps having a table for each host, and creating
this automatically in the trigger for new hosts. this would take a while
to develop though and the delays involved in creating a new table from
an insert trigger scare me, i fear that we might be dropping syslogs while
msyslog is waiting for this to complete.

i would imagine that the queries might be speeded up by using some other
indexes, well i hope!

thanks for your help,

colm ennis


-----Original Message-----
From: wsheldah@lexmark.com [mailto:wsheldah@lexmark.com]
Sent: 13 December 2001 14:57
To: colm ennis
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] slow queries on large syslog table




If these are your most common queries, and if they don't have to be
up-to-the-minute, you might consider creating a separate table that just
contains entries that meet your most common criteria, i.e. "WHERE shostid IN
(23,3)", or possibly even more restrictive. Update that table daily or
hourly
with a cron job, index it, probably drop and recreate the indexes after a
major
update or every so often depending on how much it changes. That should
reduce
the amount of data the queries need to search, and the criteria you use to
build
this lookup table can be left out of the actual report queries. A separate
query
in the web app. could look at live data for those times when you need
up-to-the-minute results enough to make it worth waiting for.

Hope this helps,

Wes Sheldahl



"colm ennis" <colm.ennis%eircom.net@interlock.lexmark.com> on 12/13/2001
09:16:42 AM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] slow queries on large syslog table


hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
     create table syslog_table (stimestamp timestamp,shostid
integer,sciscomsgid
integer, smsg text);
     create table ciscomsg_table (sciscomsgid serial,sciscomsg
varchar(128));
     create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
     create index syslog_table_stimestamp_index on syslog_table
(stimestamp);
     create index syslog_table_shostid_index on syslog_table (shostid);
     create index syslog_table_sciscomsgid_index on syslog_table
(sciscomsgid);
     create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
     matching messages - 7212
     query time(s) - 39
     explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 5
     query time(s) - 79
     explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 9
     query time(s) - 73
     explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 19
     query time(s) - 224
     explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     total messages/matching messages - /19
     query time(s) - 225
     explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)






pgsql-general by date:

Previous
From: "Henshall, Stuart - WCP"
Date:
Subject: timestamp('01/12/01') doesn't work in 7.2 beta4
Next
From: Jason Earl
Date:
Subject: Re: Money reformatting