Re: slow queries on large syslog table - Mailing list pgsql-general

From wsheldah@lexmark.com
Subject Re: slow queries on large syslog table
Date
Msg-id 200112131458.JAA27115@interlock2.lexmark.com
Whole thread Raw
In response to slow queries on large syslog table  ("colm ennis" <colm.ennis@eircom.net>)
Responses Re: slow queries on large syslog table
List pgsql-general

If these are your most common queries, and if they don't have to be
up-to-the-minute, you might consider creating a separate table that just
contains entries that meet your most common criteria, i.e. "WHERE shostid IN
(23,3)", or possibly even more restrictive. Update that table daily or hourly
with a cron job, index it, probably drop and recreate the indexes after a major
update or every so often depending on how much it changes. That should reduce
the amount of data the queries need to search, and the criteria you use to build
this lookup table can be left out of the actual report queries. A separate query
in the web app. could look at live data for those times when you need
up-to-the-minute results enough to make it worth waiting for.

Hope this helps,

Wes Sheldahl



"colm ennis" <colm.ennis%eircom.net@interlock.lexmark.com> on 12/13/2001
09:16:42 AM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] slow queries on large syslog table


hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
     create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid
integer, smsg text);
     create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128));
     create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
     create index syslog_table_stimestamp_index on syslog_table (stimestamp);
     create index syslog_table_shostid_index on syslog_table (shostid);
     create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
     create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
     matching messages - 7212
     query time(s) - 39
     explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 5
     query time(s) - 79
     explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 9
     query time(s) - 73
     explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 19
     query time(s) - 224
     explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     total messages/matching messages - /19
     query time(s) - 225
     explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)





pgsql-general by date:

Previous
From: "colm ennis"
Date:
Subject: slow queries on large syslog table
Next
From: Francisco Reyes
Date:
Subject: Re: Select .. Having vs computer fields