slow queries on large syslog table - Mailing list pgsql-general

From colm ennis
Subject slow queries on large syslog table
Date
Msg-id DKEIIKIBPBFHLPDHMIMGEEPBCHAA.colm.ennis@eircom.net
Whole thread Raw
List pgsql-general
hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
    create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid
integer, smsg text);
    create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128));
    create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
    create index syslog_table_shostid_index on syslog_table (shostid);
    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
    create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
    matching messages - 7212
    query time(s) - 39
    explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 5
    query time(s) - 79
    explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 9
    query time(s) - 73
    explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 19
    query time(s) - 224
    explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    total messages/matching messages - /19
    query time(s) - 225
    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


pgsql-general by date:

Previous
From: Carlo Perassi
Date:
Subject: Re: exit status code table?
Next
From: wsheldah@lexmark.com
Date:
Subject: Re: slow queries on large syslog table