Thread: slow queries on large syslog table

slow queries on large syslog table

From
"colm ennis"
Date:
hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
    create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid
integer, smsg text);
    create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128));
    create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
    create index syslog_table_shostid_index on syslog_table (shostid);
    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
    create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
    matching messages - 7212
    query time(s) - 39
    explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 5
    query time(s) - 79
    explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 9
    query time(s) - 73
    explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 19
    query time(s) - 224
    explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    total messages/matching messages - /19
    query time(s) - 225
    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


Re: slow queries on large syslog table

From
wsheldah@lexmark.com
Date:

If these are your most common queries, and if they don't have to be
up-to-the-minute, you might consider creating a separate table that just
contains entries that meet your most common criteria, i.e. "WHERE shostid IN
(23,3)", or possibly even more restrictive. Update that table daily or hourly
with a cron job, index it, probably drop and recreate the indexes after a major
update or every so often depending on how much it changes. That should reduce
the amount of data the queries need to search, and the criteria you use to build
this lookup table can be left out of the actual report queries. A separate query
in the web app. could look at live data for those times when you need
up-to-the-minute results enough to make it worth waiting for.

Hope this helps,

Wes Sheldahl



"colm ennis" <colm.ennis%eircom.net@interlock.lexmark.com> on 12/13/2001
09:16:42 AM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] slow queries on large syslog table


hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
     create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid
integer, smsg text);
     create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128));
     create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
     create index syslog_table_stimestamp_index on syslog_table (stimestamp);
     create index syslog_table_shostid_index on syslog_table (shostid);
     create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
     create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
     matching messages - 7212
     query time(s) - 39
     explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 5
     query time(s) - 79
     explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 9
     query time(s) - 73
     explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 19
     query time(s) - 224
     explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     total messages/matching messages - /19
     query time(s) - 225
     explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)





Re: slow queries on large syslog table

From
Antonio Fiol Bonnín
Date:
>
>
>
>ive also created a web interface for selecting syslogs based on optional
>combinations of timestamp, hostname and ciscomsg.
>

Combinations is the *magic* word.

>to speed queries i created some indexes on syslog_table :
>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>    create index syslog_table_shostid_index on syslog_table (shostid);
>    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>(shostid,sciscomsgid);
>
Most of them are of no use. That's what you observed...

I bet you will get much better perfs with:

create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table (stimestamt, shostid, sciscomsg);

You can try other combinations, but the one I suggested should be of use
in case you use all three on the query, (or even if you use only the
first, or the first two, though not sure about this last part, in
parentheses).


Believe me. Creating an index on a large table is of no use, unless it
is the right one. As a rule of thumb, include in the index as many of
the SELECTIVE columns present in the WHERE clause as you can.

I am not sure of the selectivity of your columns (never used that
particular structure). There should be some information about that on
some of the system tables. However, I do not know in which, or how to
get that info.

As a second rule of thumb, think how YOU would search for the data you
need if it was written on a paper book, and especially, how you would
like to find the book ordered. For example, if you were to look the
address corresponding to a phone number, you would like to find the data
ordered by phone number, and not by name. Name is not of any use to you.

For selectivity, think of finding the phone numbers of all people that
live at number 5, but of any street, and whose first name is Peter.

Neither "Peter" nor "5" are REALLY useful informations to perform your
search. And even, Peter is more useful than 5.

HTH,

Antonio


Re: slow queries on large syslog table

From
"colm ennis"
Date:
hi antonio,

thanks for your advice.

ive tried a lot of different index combinations, with extremely variable
results,
for instance :
    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 19

with original indexes :
    query time(s) - 225
    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

with antonios index :
    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
syslog_table (stimestamp, shostid, sciscomsgid);
    query time(s) - 174
    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
(cost=0.00..580639.57 rows=16914 width=24)

with NO! index :
    query time(s) - 77
    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
width=24)

i got similarily confusing results from other queries.

it occured to me that that the index antonio suggests is going to be huge
because
of the per second timestamp.

i thought about what you said about deciding how I would search for data.

for the query above :
    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
i would lookup shostid and sciscomsgid in a combined index, and then load
the
indexed rows in syslog_table.  there are about 80 different shostids and
sciscomsgids, so i guess the max size of this index would be 1600 rows.

for other queries i would use an shostid index or sciscomsgid if just one of
these
fields appeared in the select, avoiding the stimestamp at all cost because
its index
will be huge.

so to test if this was any good i created the combined index :
    create index syslog_table_sh_sc_index on syslog_table (shostid,
sciscomsgid);
but using explain found it isnt being used? :
    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
width=24)

i still dont understand how to use indexes to increase the speed of queries.

thanks for your help so far but i still feel lost,

colm ennis

-----Original Message-----
From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
Sent: 13 December 2001 15:29
To: colm ennis; PostgreSQL General Mailing list
Subject: Re: [GENERAL] slow queries on large syslog table


>
>
>
>ive also created a web interface for selecting syslogs based on optional
>combinations of timestamp, hostname and ciscomsg.
>

Combinations is the *magic* word.

>to speed queries i created some indexes on syslog_table :
>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>    create index syslog_table_shostid_index on syslog_table (shostid);
>    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>(shostid,sciscomsgid);
>
Most of them are of no use. That's what you observed...

I bet you will get much better perfs with:

create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table
(stimestamt, shostid, sciscomsg);

You can try other combinations, but the one I suggested should be of use
in case you use all three on the query, (or even if you use only the
first, or the first two, though not sure about this last part, in
parentheses).


Believe me. Creating an index on a large table is of no use, unless it
is the right one. As a rule of thumb, include in the index as many of
the SELECTIVE columns present in the WHERE clause as you can.

I am not sure of the selectivity of your columns (never used that
particular structure). There should be some information about that on
some of the system tables. However, I do not know in which, or how to
get that info.

As a second rule of thumb, think how YOU would search for the data you
need if it was written on a paper book, and especially, how you would
like to find the book ordered. For example, if you were to look the
address corresponding to a phone number, you would like to find the data
ordered by phone number, and not by name. Name is not of any use to you.

For selectivity, think of finding the phone numbers of all people that
live at number 5, but of any street, and whose first name is Peter.

Neither "Peter" nor "5" are REALLY useful informations to perform your
search. And even, Peter is more useful than 5.

HTH,

Antonio



Re: slow queries on large syslog table

From
"colm ennis"
Date:
hi wes,

unfortunately the searches people do can be on any combination of date,
hostid, and ciscomsgid. there are about 80 of each of these.

also as people are usually using this when a problem is detected with a
device, they will want to see the most recent message with regard to this
so im not sure if caching in a secondary table would be practicable.

i have thought about perhaps having a table for each host, and creating
this automatically in the trigger for new hosts. this would take a while
to develop though and the delays involved in creating a new table from
an insert trigger scare me, i fear that we might be dropping syslogs while
msyslog is waiting for this to complete.

i would imagine that the queries might be speeded up by using some other
indexes, well i hope!

thanks for your help,

colm ennis


-----Original Message-----
From: wsheldah@lexmark.com [mailto:wsheldah@lexmark.com]
Sent: 13 December 2001 14:57
To: colm ennis
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] slow queries on large syslog table




If these are your most common queries, and if they don't have to be
up-to-the-minute, you might consider creating a separate table that just
contains entries that meet your most common criteria, i.e. "WHERE shostid IN
(23,3)", or possibly even more restrictive. Update that table daily or
hourly
with a cron job, index it, probably drop and recreate the indexes after a
major
update or every so often depending on how much it changes. That should
reduce
the amount of data the queries need to search, and the criteria you use to
build
this lookup table can be left out of the actual report queries. A separate
query
in the web app. could look at live data for those times when you need
up-to-the-minute results enough to make it worth waiting for.

Hope this helps,

Wes Sheldahl



"colm ennis" <colm.ennis%eircom.net@interlock.lexmark.com> on 12/13/2001
09:16:42 AM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] slow queries on large syslog table


hi all,

my selects using the config below are in some cases extremely slow, i would
love
if someone would point out ways to speed this up bit. note ive only recently
started using postgresql, so feel free to point out the stupid mistakes ive
made.

im using a postgresql db to store cisco syslogs which are pumped in by
msyslog.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

my tables are :
     create table syslog_table (stimestamp timestamp,shostid
integer,sciscomsgid
integer, smsg text);
     create table ciscomsg_table (sciscomsgid serial,sciscomsg
varchar(128));
     create table host_table (shostid serial,shost varchar(128));

ive created a trigger on insertion to syslog_table which extracts ciscomsg,
and updates the other two tables.

as you can appreciate my syslog table can be quite large and is constantly
growing, it currently has about 1.7 million rows. both ciscomsg_table and
host_table have ~ 80 rows.

ive also created a web interface for selecting syslogs based on optional
combinations of timestamp, hostname and ciscomsg.

to speed queries i created some indexes on syslog_table :
     create index syslog_table_stimestamp_index on syslog_table
(stimestamp);
     create index syslog_table_shostid_index on syslog_table (shostid);
     create index syslog_table_sciscomsgid_index on syslog_table
(sciscomsgid);
     create index syslog_table_shostid_sciscomsgid_index on syslog_table
(shostid,sciscomsgid);

ive just performed a vacuum analyse for this mail, although i guess syslog
insertions will have been blocked for some of the time this running, so im
trying to avoid doing this.

ok now for queries, row counts, performance and explains.

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) ORDER BY
stimestamp DESC LIMIT 1000
     matching messages - 7212
     query time(s) - 39
     explain - Limit  (cost=0.00..34032.89 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..111846.03 rows=3286 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-12-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 5
     query time(s) - 79
     explain - Limit  (cost=75828.24..75828.24 rows=437 width=24)
  ->  Sort  (cost=75828.24..75828.24 rows=438 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..75809.05 rows=438
width=24)

query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-12-06 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 9
     query time(s) - 73
     explain - Limit  (cost=0.00..66287.05 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..421735.44 rows=6362 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE
(stimestamp
>= '2001-11-13 00:00'::timestamp) AND (shostid IN (23,3)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     matching messages - 19
     query time(s) - 224
     explain - Limit  (cost=0.00..34831.93 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..581118.60 rows=16684 width=24)

     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
     total messages/matching messages - /19
     query time(s) - 225
     explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

i dont really know enough to be able to interpret the explain output but
clearly only the syslog_table_stimestamp_index index is being used.

as you can see the query times are terrible and approach browser timeouts.

please, does anyone have any ideas on how to speed this up?

thanks in advance for any help i get,

colm ennis


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)






Re: slow queries on large syslog table

From
Antonio Fiol Bonnín
Date:
Are the rows estimations "real"?


colm ennis wrote:

>hi antonio,
>
>thanks for your advice.
>
>ive tried a lot of different index combinations, with extremely variable
>results,
>for instance :
>    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>    matching messages - 19
>
>with original indexes :
>    query time(s) - 225
>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_stimestamp_index on
>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>
>with antonios index :
>    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>syslog_table (stimestamp, shostid, sciscomsgid);
>    query time(s) - 174
>    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>(cost=0.00..580639.57 rows=16914 width=24)
>
>with NO! index :
>    query time(s) - 77
>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>width=24)
>
>i got similarily confusing results from other queries.
>
>it occured to me that that the index antonio suggests is going to be huge
>because
>of the per second timestamp.
>
>i thought about what you said about deciding how I would search for data.
>
>for the query above :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>i would lookup shostid and sciscomsgid in a combined index, and then load
>the
>indexed rows in syslog_table.  there are about 80 different shostids and
>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>
>for other queries i would use an shostid index or sciscomsgid if just one of
>these
>fields appeared in the select, avoiding the stimestamp at all cost because
>its index
>will be huge.
>
>so to test if this was any good i created the combined index :
>    create index syslog_table_sh_sc_index on syslog_table (shostid,
>sciscomsgid);
>but using explain found it isnt being used? :
>    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
>  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
>width=24)
>
>i still dont understand how to use indexes to increase the speed of queries.
>
>thanks for your help so far but i still feel lost,
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 15:29
>To: colm ennis; PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>>
>>
>>ive also created a web interface for selecting syslogs based on optional
>>combinations of timestamp, hostname and ciscomsg.
>>
>
>Combinations is the *magic* word.
>
>>to speed queries i created some indexes on syslog_table :
>>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>>    create index syslog_table_shostid_index on syslog_table (shostid);
>>    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>(shostid,sciscomsgid);
>>
>Most of them are of no use. That's what you observed...
>
>I bet you will get much better perfs with:
>
>create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table
>(stimestamt, shostid, sciscomsg);
>
>You can try other combinations, but the one I suggested should be of use
>in case you use all three on the query, (or even if you use only the
>first, or the first two, though not sure about this last part, in
>parentheses).
>
>
>Believe me. Creating an index on a large table is of no use, unless it
>is the right one. As a rule of thumb, include in the index as many of
>the SELECTIVE columns present in the WHERE clause as you can.
>
>I am not sure of the selectivity of your columns (never used that
>particular structure). There should be some information about that on
>some of the system tables. However, I do not know in which, or how to
>get that info.
>
>As a second rule of thumb, think how YOU would search for the data you
>need if it was written on a paper book, and especially, how you would
>like to find the book ordered. For example, if you were to look the
>address corresponding to a phone number, you would like to find the data
>ordered by phone number, and not by name. Name is not of any use to you.
>
>For selectivity, think of finding the phone numbers of all people that
>live at number 5, but of any street, and whose first name is Peter.
>
>Neither "Peter" nor "5" are REALLY useful informations to perform your
>search. And even, Peter is more useful than 5.
>
>HTH,
>
>Antonio
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>
>.
>




Re: slow queries on large syslog table

From
Chris Albertson
Date:
I just read your SELECT queries and the time it took.
What kind of computer (RAM, CPU), how many buffers (-B),
what kind od disks
and what is the non-Postgres load on the system at the
time of the query.  Yes it is dead slow.  The query is
not complex.

--- colm ennis <colm.ennis@eircom.net> wrote:
> hi wes,
>
> unfortunately the searches people do can be on any combination of
> date,
> hostid, and ciscomsgid. there are about 80 of each of these.
>
> also as people are usually using this when a problem is detected with
> a
> device, they will want to see the most recent message with regard to
> this
> so im not sure if caching in a secondary table would be practicable.
>
> i have thought about perhaps having

=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

Re: slow queries on large syslog table

From
Stephan Szabo
Date:
On Thu, 13 Dec 2001, colm ennis wrote:

> hi antonio,
>
> thanks for your advice.
>
> ive tried a lot of different index combinations, with extremely variable
> results,
> for instance :
>     query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
> (23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>     matching messages - 19
>
> with original indexes :
>     query time(s) - 225
>     explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>   ->  Index Scan Backward using syslog_table_stimestamp_index on
> syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>
> with NO! index :
>     query time(s) - 77
>     explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>   ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>         ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
> width=24)

Have you been running vacuum analyze?  If I'm reading correctly what you
mean,  you've got 19 matching messages, but the estimated return rows is
much much larger than that.


Re: slow queries on large syslog table

From
"colm ennis"
Date:
hi all,

thanks for your help, its comforting but also kinda scary to know
im not the only one whos having trouble!

in response to questions....

as i mentioned before, the syslog_table is currently holds about
1.7 million rows and is constantly slowly growing, the hostid_table
and ciscomdgid_table each hold about 80 rows.

the number of rows returned when i ran the query below :
    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
was 19 in all cases, so i guess? the row estimations are woefully
inaccurate.

i ran a vacuum analyse a few minutes prior to trying these queries.

my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
standard ports install.

with regard to resources, heres the output of top mid select :
    last pid: 77402;  load averages:  0.17,  0.08,  0.03
up 87+05:05:42  23:36:25
    48 processes:  2 running, 46 sleeping
    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
idle
    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
    Swap:

      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
(hmm ...seems like no swap device is configured, but theres loads of
inactive pages anyway i guess)

memory :
    hw.physmem: 264351744
cpu :
    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
disk :
    1 x 9gig scsi

im not using the -B option so i guess im using the3 default number/size
buffers.

hope this helps!

thanks again for all your help a i am completely clueless!

colm ennis

-----Original Message-----
From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
Sent: 13 December 2001 18:14
To: colm ennis
Cc: PostgreSQL General Mailing list
Subject: Re: [GENERAL] slow queries on large syslog table


Are the rows estimations "real"?


colm ennis wrote:

>hi antonio,
>
>thanks for your advice.
>
>ive tried a lot of different index combinations, with extremely variable
>results,
>for instance :
>    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>    matching messages - 19
>
>with original indexes :
>    query time(s) - 225
>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_stimestamp_index on
>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>
>with antonios index :
>    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>syslog_table (stimestamp, shostid, sciscomsgid);
>    query time(s) - 174
>    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>(cost=0.00..580639.57 rows=16914 width=24)
>
>with NO! index :
>    query time(s) - 77
>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>width=24)
>
>i got similarily confusing results from other queries.
>
>it occured to me that that the index antonio suggests is going to be huge
>because
>of the per second timestamp.
>
>i thought about what you said about deciding how I would search for data.
>
>for the query above :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>i would lookup shostid and sciscomsgid in a combined index, and then load
>the
>indexed rows in syslog_table.  there are about 80 different shostids and
>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>
>for other queries i would use an shostid index or sciscomsgid if just one
of
>these
>fields appeared in the select, avoiding the stimestamp at all cost because
>its index
>will be huge.
>
>so to test if this was any good i created the combined index :
>    create index syslog_table_sh_sc_index on syslog_table (shostid,
>sciscomsgid);
>but using explain found it isnt being used? :
>    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
>  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
>width=24)
>
>i still dont understand how to use indexes to increase the speed of
queries.
>
>thanks for your help so far but i still feel lost,
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 15:29
>To: colm ennis; PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>>
>>
>>ive also created a web interface for selecting syslogs based on optional
>>combinations of timestamp, hostname and ciscomsg.
>>
>
>Combinations is the *magic* word.
>
>>to speed queries i created some indexes on syslog_table :
>>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>>    create index syslog_table_shostid_index on syslog_table (shostid);
>>    create index syslog_table_sciscomsgid_index on syslog_table
(sciscomsgid);
>>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>(shostid,sciscomsgid);
>>
>Most of them are of no use. That's what you observed...
>
>I bet you will get much better perfs with:
>
>create index syslog_table_stimestamp_shostid_sciscomsg_index on
syslog_table
>(stimestamt, shostid, sciscomsg);
>
>You can try other combinations, but the one I suggested should be of use
>in case you use all three on the query, (or even if you use only the
>first, or the first two, though not sure about this last part, in
>parentheses).
>
>
>Believe me. Creating an index on a large table is of no use, unless it
>is the right one. As a rule of thumb, include in the index as many of
>the SELECTIVE columns present in the WHERE clause as you can.
>
>I am not sure of the selectivity of your columns (never used that
>particular structure). There should be some information about that on
>some of the system tables. However, I do not know in which, or how to
>get that info.
>
>As a second rule of thumb, think how YOU would search for the data you
>need if it was written on a paper book, and especially, how you would
>like to find the book ordered. For example, if you were to look the
>address corresponding to a phone number, you would like to find the data
>ordered by phone number, and not by name. Name is not of any use to you.
>
>For selectivity, think of finding the phone numbers of all people that
>live at number 5, but of any street, and whose first name is Peter.
>
>Neither "Peter" nor "5" are REALLY useful informations to perform your
>search. And even, Peter is more useful than 5.
>
>HTH,
>
>Antonio
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>
>.
>





Re: slow queries on large syslog table

From
Chris Albertson
Date:
--- colm ennis <colm.ennis@eircom.net> wrote:

>
> im not using the -B option so i guess im using the3 default
> number/size
> buffers.

That is the number one thing you can do to help.
The default buffer size is 64 * 8K = 0.5M. Half
a megabyte is pretty darm small.  Try going to 100Mb
and see what hapens.  Thats a 500x increase.

=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

Re: slow queries on large syslog table

From
Stephan Szabo
Date:
On Thu, 13 Dec 2001, colm ennis wrote:

> hi all,
>
> thanks for your help, its comforting but also kinda scary to know
> im not the only one whos having trouble!
>
> in response to questions....
>
> as i mentioned before, the syslog_table is currently holds about
> 1.7 million rows and is constantly slowly growing, the hostid_table
> and ciscomdgid_table each hold about 80 rows.
>
> the number of rows returned when i ran the query below :
>     SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
> AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
> was 19 in all cases, so i guess? the row estimations are woefully
> inaccurate.
>
> i ran a vacuum analyse a few minutes prior to trying these queries.

What does:

select * from pg_statistic,pg_class
 where starelid=pg_class.oid and
  relname='syslog_table';
give you?  (That's the evaluated statistics)

I'm wondering if there's very common values that's throwing off the
estimates.

> im not using the -B option so i guess im using the3 default number/size
> buffers.

Well, you'll definately want to change that :).  I'd suggest going into
your postgresql.conf (in the data directory) and raising shared_buffers
and sort_mem. You'll have to play with the values to find the right point
between postgresql's buffers and the system's, but maybe start in the
low thousands.


Re: slow queries on large syslog table

From
Jochem van Dieten
Date:
Stephan Szabo wrote:

> On Thu, 13 Dec 2001, colm ennis wrote:
>>
>>with original indexes :
>>    query time(s) - 225
>>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>>  ->  Index Scan Backward using syslog_table_stimestamp_index on
>>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>>
>>with NO! index :
>>    query time(s) - 77
>>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>>  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>>        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>>width=24)
>>
>
> Have you been running vacuum analyze?  If I'm reading correctly what you
> mean,  you've got 19 matching messages, but the estimated return rows is
> much much larger than that.


I would guess that what is returned as the estimates for the amount of
matching rows is the default estimate of 1%. There is a setting
somewhere(?) where you can change the default estimates. It would be
interesting to know what happens to the default query plans if you
change these settings.

Jochem



Re: slow queries on large syslog table

From
Jochem van Dieten
Date:
colm ennis wrote:

>
> im using a postgresql db to store cisco syslogs which are pumped in by
> msyslog.
>
> my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
> standard ports install.
>
> my tables are :
>     create table syslog_table (stimestamp timestamp,shostid integer,sciscomsgid
> integer, smsg text);
>     create table ciscomsg_table (sciscomsgid serial,sciscomsg varchar(128));
>     create table host_table (shostid serial,shost varchar(128));
>
> ive created a trigger on insertion to syslog_table which extracts ciscomsg,
> and updates the other two tables.

So what is in the smsg field if the ciscomsg is extracted by the trigger
and placed in a separate table? Something so big you really need a text
field?

Jochem


Re: slow queries on large syslog table

From
Antonio Fiol Bonnín
Date:
Hi again!

For a table that contains an URL id (~2000 different ones), a zone id
(max. 30 different ones), a timestamp (one every 3 hours for every pair
(url, zone) and 5 results columns for the combination (url, zone, time).

So, I get about 2000*(30/3)*1 INSERTs on that table every 3 hours. That
makes a 5 Million tuples table of it by now. 1/3 is an estimative
factor, as not every couple (url, zone) is valid.

For queries, I always ask the same type of information, more or less:

SELECT * FROM T WHERE stamp BETWEEN xxxxx AND yyyyy AND (zone=zz OR
zone=ww [...]) AND url=uuuu;
where xxxxx, yyyyy, zz, ww and uuuu are arbitrary constants of the
needed types.

Requests from that table take about 1 second. An index using all three
columns (url, zone, stamp), in that order is created.

If that index is not present, or is the wrong order, nothing works
(requests take for ages).

WHICH LEADS ME TO THINK THAT I GAVE YOU THE WRONG ANSWER!!!!!

Put the timestamp as the last column at the index.

It is MUCH better if the first columns match as EXACTLY as possible. If
an index matches a range, results may be good, BUT if it matches a
CONSTANT, results are wonderful!!

Well, maybe I am going a bit too far with my last sentence, but you get
it, don't you?

With the idea about the phone book again. You were true that you'd
better go directly to the page containing the exact data you need than
searching for any data range at all.

BUT FORGET about the index size, unless space is a major limitation for
you. These days, disk space is cheap ;-)

Sincerely, It is really worth trying.

Antonio


colm ennis wrote:

>hi all,
>
>thanks for your help, its comforting but also kinda scary to know
>im not the only one whos having trouble!
>
>in response to questions....
>
>as i mentioned before, the syslog_table is currently holds about
>1.7 million rows and is constantly slowly growing, the hostid_table
>and ciscomdgid_table each hold about 80 rows.
>
>the number of rows returned when i ran the query below :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>was 19 in all cases, so i guess? the row estimations are woefully
>inaccurate.
>
>i ran a vacuum analyse a few minutes prior to trying these queries.
>
>my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
>standard ports install.
>
>with regard to resources, heres the output of top mid select :
>    last pid: 77402;  load averages:  0.17,  0.08,  0.03
>up 87+05:05:42  23:36:25
>    48 processes:  2 running, 46 sleeping
>    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
>idle
>    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
>    Swap:
>
>      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
>    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
>    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
>    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
>    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
>    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
>(hmm ...seems like no swap device is configured, but theres loads of
>inactive pages anyway i guess)
>
>memory :
>    hw.physmem: 264351744
>cpu :
>    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
>disk :
>    1 x 9gig scsi
>
>im not using the -B option so i guess im using the3 default number/size
>buffers.
>
>hope this helps!
>
>thanks again for all your help a i am completely clueless!
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 18:14
>To: colm ennis
>Cc: PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>Are the rows estimations "real"?
>
>
>colm ennis wrote:
>
>>hi antonio,
>>
>>thanks for your advice.
>>
>>ive tried a lot of different index combinations, with extremely variable
>>results,
>>for instance :
>>    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>>    matching messages - 19
>>
>>with original indexes :
>>    query time(s) - 225
>>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>> ->  Index Scan Backward using syslog_table_stimestamp_index on
>>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>>
>>with antonios index :
>>    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>>syslog_table (stimestamp, shostid, sciscomsgid);
>>    query time(s) - 174
>>    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
>> ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>>(cost=0.00..580639.57 rows=16914 width=24)
>>
>>with NO! index :
>>    query time(s) - 77
>>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>> ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>>       ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>>width=24)
>>
>>i got similarily confusing results from other queries.
>>
>>it occured to me that that the index antonio suggests is going to be huge
>>because
>>of the per second timestamp.
>>
>>i thought about what you said about deciding how I would search for data.
>>
>>for the query above :
>>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>>i would lookup shostid and sciscomsgid in a combined index, and then load
>>the
>>indexed rows in syslog_table.  there are about 80 different shostids and
>>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>>
>>for other queries i would use an shostid index or sciscomsgid if just one
>>
>of
>
>>these
>>fields appeared in the select, avoiding the stimestamp at all cost because
>>its index
>>will be huge.
>>
>>so to test if this was any good i created the combined index :
>>    create index syslog_table_sh_sc_index on syslog_table (shostid,
>>sciscomsgid);
>>but using explain found it isnt being used? :
>>    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
>> ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
>>       ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
>>width=24)
>>
>>i still dont understand how to use indexes to increase the speed of
>>
>queries.
>
>>thanks for your help so far but i still feel lost,
>>
>>colm ennis
>>
>>-----Original Message-----
>>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>>Sent: 13 December 2001 15:29
>>To: colm ennis; PostgreSQL General Mailing list
>>Subject: Re: [GENERAL] slow queries on large syslog table
>>
>>
>>>
>>>ive also created a web interface for selecting syslogs based on optional
>>>combinations of timestamp, hostname and ciscomsg.
>>>
>>Combinations is the *magic* word.
>>
>>>to speed queries i created some indexes on syslog_table :
>>>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>>>    create index syslog_table_shostid_index on syslog_table (shostid);
>>>    create index syslog_table_sciscomsgid_index on syslog_table
>>>
>(sciscomsgid);
>
>>>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>>(shostid,sciscomsgid);
>>>
>>Most of them are of no use. That's what you observed...
>>
>>I bet you will get much better perfs with:
>>
>>create index syslog_table_stimestamp_shostid_sciscomsg_index on
>>
>syslog_table
>
>>(stimestamt, shostid, sciscomsg);
>>
>>You can try other combinations, but the one I suggested should be of use
>>in case you use all three on the query, (or even if you use only the
>>first, or the first two, though not sure about this last part, in
>>parentheses).
>>
>>
>>Believe me. Creating an index on a large table is of no use, unless it
>>is the right one. As a rule of thumb, include in the index as many of
>>the SELECTIVE columns present in the WHERE clause as you can.
>>
>>I am not sure of the selectivity of your columns (never used that
>>particular structure). There should be some information about that on
>>some of the system tables. However, I do not know in which, or how to
>>get that info.
>>
>>As a second rule of thumb, think how YOU would search for the data you
>>need if it was written on a paper book, and especially, how you would
>>like to find the book ordered. For example, if you were to look the
>>address corresponding to a phone number, you would like to find the data
>>ordered by phone number, and not by name. Name is not of any use to you.
>>
>>For selectivity, think of finding the phone numbers of all people that
>>live at number 5, but of any street, and whose first name is Peter.
>>
>>Neither "Peter" nor "5" are REALLY useful informations to perform your
>>search. And even, Peter is more useful than 5.
>>
>>HTH,
>>
>>Antonio
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>http://archives.postgresql.org
>>
>>.
>>
>
>
>
>
>.
>




Re: slow queries on large syslog table

From
"omid omoomi"
Date:
hi,
have you tried your queries with out order by clause?
That might be significant.

regards
Omid Omoomi

>From: "colm ennis" <colm.ennis@eircom.net>
>To: Antonio Fiol Bonnin <fiol@w3ping.com>
>CC: "PostgreSQL General Mailing list" <pgsql-general@postgresql.org>
>Subject: Re: [GENERAL] slow queries on large syslog table
>Date: Thu, 13 Dec 2001 23:58:32 -0000
>
>hi all,
>
>thanks for your help, its comforting but also kinda scary to know
>im not the only one whos having trouble!
>
>in response to questions....
>
>as i mentioned before, the syslog_table is currently holds about
>1.7 million rows and is constantly slowly growing, the hostid_table
>and ciscomdgid_table each hold about 80 rows.
>
>the number of rows returned when i ran the query below :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>was 19 in all cases, so i guess? the row estimations are woefully
>inaccurate.
>
>i ran a vacuum analyse a few minutes prior to trying these queries.
>
>my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
>standard ports install.
>
>with regard to resources, heres the output of top mid select :
>    last pid: 77402;  load averages:  0.17,  0.08,  0.03
>up 87+05:05:42  23:36:25
>    48 processes:  2 running, 46 sleeping
>    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
>idle
>    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
>    Swap:
>
>      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
>    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
>    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
>    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
>    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
>    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
>(hmm ...seems like no swap device is configured, but theres loads of
>inactive pages anyway i guess)
>
>memory :
>    hw.physmem: 264351744
>cpu :
>    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
>disk :
>    1 x 9gig scsi
>
>im not using the -B option so i guess im using the3 default number/size
>buffers.
>
>hope this helps!
>
>thanks again for all your help a i am completely clueless!
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 18:14
>To: colm ennis
>Cc: PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>Are the rows estimations "real"?
>
>
>colm ennis wrote:
>
> >hi antonio,
> >
> >thanks for your advice.
> >
> >ive tried a lot of different index combinations, with extremely variable
> >results,
> >for instance :
> >    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
>IN
> >(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT
>1000
> >    matching messages - 19
> >
> >with original indexes :
> >    query time(s) - 225
> >    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_stimestamp_index on
> >syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
> >
> >with antonios index :
> >    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
> >syslog_table (stimestamp, shostid, sciscomsgid);
> >    query time(s) - 174
> >    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_st_sh_sc_index on
>syslog_table
> >(cost=0.00..580639.57 rows=16914 width=24)
> >
> >with NO! index :
> >    query time(s) - 77
> >    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
> >  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
> >width=24)
> >
> >i got similarily confusing results from other queries.
> >
> >it occured to me that that the index antonio suggests is going to be huge
> >because
> >of the per second timestamp.
> >
> >i thought about what you said about deciding how I would search for data.
> >
> >for the query above :
> >    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3))
> >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
> >i would lookup shostid and sciscomsgid in a combined index, and then load
> >the
> >indexed rows in syslog_table.  there are about 80 different shostids and
> >sciscomsgids, so i guess the max size of this index would be 1600 rows.
> >
> >for other queries i would use an shostid index or sciscomsgid if just one
>of
> >these
> >fields appeared in the select, avoiding the stimestamp at all cost
>because
> >its index
> >will be huge.
> >
> >so to test if this was any good i created the combined index :
> >    create index syslog_table_sh_sc_index on syslog_table (shostid,
> >sciscomsgid);
> >but using explain found it isnt being used? :
> >    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
> >  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
> >width=24)
> >
> >i still dont understand how to use indexes to increase the speed of
>queries.
> >
> >thanks for your help so far but i still feel lost,
> >
> >colm ennis
> >
> >-----Original Message-----
> >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
> >Sent: 13 December 2001 15:29
> >To: colm ennis; PostgreSQL General Mailing list
> >Subject: Re: [GENERAL] slow queries on large syslog table
> >
> >
> >>
> >>
> >>ive also created a web interface for selecting syslogs based on optional
> >>combinations of timestamp, hostname and ciscomsg.
> >>
> >
> >Combinations is the *magic* word.
> >
> >>to speed queries i created some indexes on syslog_table :
> >>    create index syslog_table_stimestamp_index on syslog_table
>(stimestamp);
> >>    create index syslog_table_shostid_index on syslog_table (shostid);
> >>    create index syslog_table_sciscomsgid_index on syslog_table
>(sciscomsgid);
> >>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
> >>(shostid,sciscomsgid);
> >>
> >Most of them are of no use. That's what you observed...
> >
> >I bet you will get much better perfs with:
> >
> >create index syslog_table_stimestamp_shostid_sciscomsg_index on
>syslog_table
> >(stimestamt, shostid, sciscomsg);
> >
> >You can try other combinations, but the one I suggested should be of use
> >in case you use all three on the query, (or even if you use only the
> >first, or the first two, though not sure about this last part, in
> >parentheses).
> >
> >
> >Believe me. Creating an index on a large table is of no use, unless it
> >is the right one. As a rule of thumb, include in the index as many of
> >the SELECTIVE columns present in the WHERE clause as you can.
> >
> >I am not sure of the selectivity of your columns (never used that
> >particular structure). There should be some information about that on
> >some of the system tables. However, I do not know in which, or how to
> >get that info.
> >
> >As a second rule of thumb, think how YOU would search for the data you
> >need if it was written on a paper book, and especially, how you would
> >like to find the book ordered. For example, if you were to look the
> >address corresponding to a phone number, you would like to find the data
> >ordered by phone number, and not by name. Name is not of any use to you.
> >
> >For selectivity, think of finding the phone numbers of all people that
> >live at number 5, but of any street, and whose first name is Peter.
> >
> >Neither "Peter" nor "5" are REALLY useful informations to perform your
> >search. And even, Peter is more useful than 5.
> >
> >HTH,
> >
> >Antonio
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 6: Have you searched our list archives?
> >
> >http://archives.postgresql.org
> >
> >.
> >
>
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html




_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.


Re: slow queries on large syslog table

From
"colm ennis"
Date:
hi omid,

i have and it makes little difference to the query time or to the
explain im afraid,

colm

-----Original Message-----
From: omid omoomi [mailto:oomoomi@hotmail.com]
Sent: 14 December 2001 10:23
To: colm.ennis@eircom.net; fiol@w3ping.com
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] slow queries on large syslog table


hi,
have you tried your queries with out order by clause?
That might be significant.

regards
Omid Omoomi

>From: "colm ennis" <colm.ennis@eircom.net>
>To: Antonio Fiol Bonnin <fiol@w3ping.com>
>CC: "PostgreSQL General Mailing list" <pgsql-general@postgresql.org>
>Subject: Re: [GENERAL] slow queries on large syslog table
>Date: Thu, 13 Dec 2001 23:58:32 -0000
>
>hi all,
>
>thanks for your help, its comforting but also kinda scary to know
>im not the only one whos having trouble!
>
>in response to questions....
>
>as i mentioned before, the syslog_table is currently holds about
>1.7 million rows and is constantly slowly growing, the hostid_table
>and ciscomdgid_table each hold about 80 rows.
>
>the number of rows returned when i ran the query below :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>was 19 in all cases, so i guess? the row estimations are woefully
>inaccurate.
>
>i ran a vacuum analyse a few minutes prior to trying these queries.
>
>my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
>standard ports install.
>
>with regard to resources, heres the output of top mid select :
>    last pid: 77402;  load averages:  0.17,  0.08,  0.03
>up 87+05:05:42  23:36:25
>    48 processes:  2 running, 46 sleeping
>    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
>idle
>    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
>    Swap:
>
>      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
>    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
>    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
>    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
>    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
>    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
>(hmm ...seems like no swap device is configured, but theres loads of
>inactive pages anyway i guess)
>
>memory :
>    hw.physmem: 264351744
>cpu :
>    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
>disk :
>    1 x 9gig scsi
>
>im not using the -B option so i guess im using the3 default number/size
>buffers.
>
>hope this helps!
>
>thanks again for all your help a i am completely clueless!
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 18:14
>To: colm ennis
>Cc: PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>Are the rows estimations "real"?
>
>
>colm ennis wrote:
>
> >hi antonio,
> >
> >thanks for your advice.
> >
> >ive tried a lot of different index combinations, with extremely variable
> >results,
> >for instance :
> >    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
>IN
> >(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT
>1000
> >    matching messages - 19
> >
> >with original indexes :
> >    query time(s) - 225
> >    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_stimestamp_index on
> >syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
> >
> >with antonios index :
> >    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
> >syslog_table (stimestamp, shostid, sciscomsgid);
> >    query time(s) - 174
> >    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_st_sh_sc_index on
>syslog_table
> >(cost=0.00..580639.57 rows=16914 width=24)
> >
> >with NO! index :
> >    query time(s) - 77
> >    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
> >  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
> >width=24)
> >
> >i got similarily confusing results from other queries.
> >
> >it occured to me that that the index antonio suggests is going to be huge
> >because
> >of the per second timestamp.
> >
> >i thought about what you said about deciding how I would search for data.
> >
> >for the query above :
> >    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3))
> >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
> >i would lookup shostid and sciscomsgid in a combined index, and then load
> >the
> >indexed rows in syslog_table.  there are about 80 different shostids and
> >sciscomsgids, so i guess the max size of this index would be 1600 rows.
> >
> >for other queries i would use an shostid index or sciscomsgid if just one
>of
> >these
> >fields appeared in the select, avoiding the stimestamp at all cost
>because
> >its index
> >will be huge.
> >
> >so to test if this was any good i created the combined index :
> >    create index syslog_table_sh_sc_index on syslog_table (shostid,
> >sciscomsgid);
> >but using explain found it isnt being used? :
> >    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
> >  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
> >width=24)
> >
> >i still dont understand how to use indexes to increase the speed of
>queries.
> >
> >thanks for your help so far but i still feel lost,
> >
> >colm ennis
> >
> >-----Original Message-----
> >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
> >Sent: 13 December 2001 15:29
> >To: colm ennis; PostgreSQL General Mailing list
> >Subject: Re: [GENERAL] slow queries on large syslog table
> >
> >
> >>
> >>
> >>ive also created a web interface for selecting syslogs based on optional
> >>combinations of timestamp, hostname and ciscomsg.
> >>
> >
> >Combinations is the *magic* word.
> >
> >>to speed queries i created some indexes on syslog_table :
> >>    create index syslog_table_stimestamp_index on syslog_table
>(stimestamp);
> >>    create index syslog_table_shostid_index on syslog_table (shostid);
> >>    create index syslog_table_sciscomsgid_index on syslog_table
>(sciscomsgid);
> >>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
> >>(shostid,sciscomsgid);
> >>
> >Most of them are of no use. That's what you observed...
> >
> >I bet you will get much better perfs with:
> >
> >create index syslog_table_stimestamp_shostid_sciscomsg_index on
>syslog_table
> >(stimestamt, shostid, sciscomsg);
> >
> >You can try other combinations, but the one I suggested should be of use
> >in case you use all three on the query, (or even if you use only the
> >first, or the first two, though not sure about this last part, in
> >parentheses).
> >
> >
> >Believe me. Creating an index on a large table is of no use, unless it
> >is the right one. As a rule of thumb, include in the index as many of
> >the SELECTIVE columns present in the WHERE clause as you can.
> >
> >I am not sure of the selectivity of your columns (never used that
> >particular structure). There should be some information about that on
> >some of the system tables. However, I do not know in which, or how to
> >get that info.
> >
> >As a second rule of thumb, think how YOU would search for the data you
> >need if it was written on a paper book, and especially, how you would
> >like to find the book ordered. For example, if you were to look the
> >address corresponding to a phone number, you would like to find the data
> >ordered by phone number, and not by name. Name is not of any use to you.
> >
> >For selectivity, think of finding the phone numbers of all people that
> >live at number 5, but of any street, and whose first name is Peter.
> >
> >Neither "Peter" nor "5" are REALLY useful informations to perform your
> >search. And even, Peter is more useful than 5.
> >
> >HTH,
> >
> >Antonio
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 6: Have you searched our list archives?
> >
> >http://archives.postgresql.org
> >
> >.
> >
>
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html




_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.



Re: slow queries on large syslog table

From
"colm ennis"
Date:
ok antonio,

heres what i did :

    create index syslog_table_sh_sc_st_index on syslog_table (shostid,
sciscomsgid, stimestamp);

    vacuum analyze;

    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000;
    matching messages - 49
    query time(s) - 114

    EXPLAIN SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000;
    Limit  (cost=77745.03..77745.03 rows=1000 width=24)
  ->  Sort  (cost=77745.03..77745.03 rows=17715 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..76285.88 rows=17715
width=24)

    EXPLAIN SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70));
    Seq Scan on syslog_table  (cost=0.00..76285.88 rows=17715 width=24)

so maybe the index is ignored because the timestamp isnt in the query:

    EXPLAIN SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (stimestamp
>= '2001-11-14 00:00'::timestamp) AND (shostid IN (30,22)) AND (sciscomsgid
IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000;
    Limit  (cost=82190.08..82190.08 rows=1000 width=24)
  ->  Sort  (cost=82190.08..82190.08 rows=17702 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..80731.90 rows=17702
width=24)

so how fast is it without the index?
    drop index syslog_table_sh_sc_st_index;

    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (30,22))
AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 49
    query time(s) - 77

faster!!! sorry my results are so confusing,

colm

-----Original Message-----
From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
Sent: 14 December 2001 08:14
To: colm ennis
Cc: PostgreSQL General Mailing list
Subject: Re: [GENERAL] slow queries on large syslog table


Hi again!

For a table that contains an URL id (~2000 different ones), a zone id
(max. 30 different ones), a timestamp (one every 3 hours for every pair
(url, zone) and 5 results columns for the combination (url, zone, time).

So, I get about 2000*(30/3)*1 INSERTs on that table every 3 hours. That
makes a 5 Million tuples table of it by now. 1/3 is an estimative
factor, as not every couple (url, zone) is valid.

For queries, I always ask the same type of information, more or less:

SELECT * FROM T WHERE stamp BETWEEN xxxxx AND yyyyy AND (zone=zz OR
zone=ww [...]) AND url=uuuu;
where xxxxx, yyyyy, zz, ww and uuuu are arbitrary constants of the
needed types.

Requests from that table take about 1 second. An index using all three
columns (url, zone, stamp), in that order is created.

If that index is not present, or is the wrong order, nothing works
(requests take for ages).

WHICH LEADS ME TO THINK THAT I GAVE YOU THE WRONG ANSWER!!!!!

Put the timestamp as the last column at the index.

It is MUCH better if the first columns match as EXACTLY as possible. If
an index matches a range, results may be good, BUT if it matches a
CONSTANT, results are wonderful!!

Well, maybe I am going a bit too far with my last sentence, but you get
it, don't you?

With the idea about the phone book again. You were true that you'd
better go directly to the page containing the exact data you need than
searching for any data range at all.

BUT FORGET about the index size, unless space is a major limitation for
you. These days, disk space is cheap ;-)

Sincerely, It is really worth trying.

Antonio


colm ennis wrote:

>hi all,
>
>thanks for your help, its comforting but also kinda scary to know
>im not the only one whos having trouble!
>
>in response to questions....
>
>as i mentioned before, the syslog_table is currently holds about
>1.7 million rows and is constantly slowly growing, the hostid_table
>and ciscomdgid_table each hold about 80 rows.
>
>the number of rows returned when i ran the query below :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>was 19 in all cases, so i guess? the row estimations are woefully
>inaccurate.
>
>i ran a vacuum analyse a few minutes prior to trying these queries.
>
>my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
>standard ports install.
>
>with regard to resources, heres the output of top mid select :
>    last pid: 77402;  load averages:  0.17,  0.08,  0.03
>up 87+05:05:42  23:36:25
>    48 processes:  2 running, 46 sleeping
>    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
>idle
>    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
>    Swap:
>
>      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
>    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
>    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
>    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
>    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
>    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
>(hmm ...seems like no swap device is configured, but theres loads of
>inactive pages anyway i guess)
>
>memory :
>    hw.physmem: 264351744
>cpu :
>    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
>disk :
>    1 x 9gig scsi
>
>im not using the -B option so i guess im using the3 default number/size
>buffers.
>
>hope this helps!
>
>thanks again for all your help a i am completely clueless!
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 18:14
>To: colm ennis
>Cc: PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>Are the rows estimations "real"?
>
>
>colm ennis wrote:
>
>>hi antonio,
>>
>>thanks for your advice.
>>
>>ive tried a lot of different index combinations, with extremely variable
>>results,
>>for instance :
>>    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
IN
>>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT
1000
>>    matching messages - 19
>>
>>with original indexes :
>>    query time(s) - 225
>>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>> ->  Index Scan Backward using syslog_table_stimestamp_index on
>>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>>
>>with antonios index :
>>    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>>syslog_table (stimestamp, shostid, sciscomsgid);
>>    query time(s) - 174
>>    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
>> ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>>(cost=0.00..580639.57 rows=16914 width=24)
>>
>>with NO! index :
>>    query time(s) - 77
>>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>> ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>>       ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>>width=24)
>>
>>i got similarily confusing results from other queries.
>>
>>it occured to me that that the index antonio suggests is going to be huge
>>because
>>of the per second timestamp.
>>
>>i thought about what you said about deciding how I would search for data.
>>
>>for the query above :
>>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3))
>>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>>i would lookup shostid and sciscomsgid in a combined index, and then load
>>the
>>indexed rows in syslog_table.  there are about 80 different shostids and
>>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>>
>>for other queries i would use an shostid index or sciscomsgid if just one
>>
>of
>
>>these
>>fields appeared in the select, avoiding the stimestamp at all cost because
>>its index
>>will be huge.
>>
>>so to test if this was any good i created the combined index :
>>    create index syslog_table_sh_sc_index on syslog_table (shostid,
>>sciscomsgid);
>>but using explain found it isnt being used? :
>>    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
>> ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
>>       ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
>>width=24)
>>
>>i still dont understand how to use indexes to increase the speed of
>>
>queries.
>
>>thanks for your help so far but i still feel lost,
>>
>>colm ennis
>>
>>-----Original Message-----
>>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>>Sent: 13 December 2001 15:29
>>To: colm ennis; PostgreSQL General Mailing list
>>Subject: Re: [GENERAL] slow queries on large syslog table
>>
>>
>>>
>>>ive also created a web interface for selecting syslogs based on optional
>>>combinations of timestamp, hostname and ciscomsg.
>>>
>>Combinations is the *magic* word.
>>
>>>to speed queries i created some indexes on syslog_table :
>>>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>>>    create index syslog_table_shostid_index on syslog_table (shostid);
>>>    create index syslog_table_sciscomsgid_index on syslog_table
>>>
>(sciscomsgid);
>
>>>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>>(shostid,sciscomsgid);
>>>
>>Most of them are of no use. That's what you observed...
>>
>>I bet you will get much better perfs with:
>>
>>create index syslog_table_stimestamp_shostid_sciscomsg_index on
>>
>syslog_table
>
>>(stimestamt, shostid, sciscomsg);
>>
>>You can try other combinations, but the one I suggested should be of use
>>in case you use all three on the query, (or even if you use only the
>>first, or the first two, though not sure about this last part, in
>>parentheses).
>>
>>
>>Believe me. Creating an index on a large table is of no use, unless it
>>is the right one. As a rule of thumb, include in the index as many of
>>the SELECTIVE columns present in the WHERE clause as you can.
>>
>>I am not sure of the selectivity of your columns (never used that
>>particular structure). There should be some information about that on
>>some of the system tables. However, I do not know in which, or how to
>>get that info.
>>
>>As a second rule of thumb, think how YOU would search for the data you
>>need if it was written on a paper book, and especially, how you would
>>like to find the book ordered. For example, if you were to look the
>>address corresponding to a phone number, you would like to find the data
>>ordered by phone number, and not by name. Name is not of any use to you.
>>
>>For selectivity, think of finding the phone numbers of all people that
>>live at number 5, but of any street, and whose first name is Peter.
>>
>>Neither "Peter" nor "5" are REALLY useful informations to perform your
>>search. And even, Peter is more useful than 5.
>>
>>HTH,
>>
>>Antonio
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>http://archives.postgresql.org
>>
>>.
>>
>
>
>
>
>.
>





Re: slow queries on large syslog table

From
"omid omoomi"
Date:
How about clusterring? "CLUSTER indexname ON table"


>From: "colm ennis" <colm.ennis@eircom.net>
>To: "omid omoomi" <oomoomi@hotmail.com>, <fiol@w3ping.com>
>CC: <pgsql-general@postgresql.org>
>Subject: Re: [GENERAL] slow queries on large syslog table
>Date: Fri, 14 Dec 2001 19:54:21 -0000
>
>hi omid,
>
>i have and it makes little difference to the query time or to the
>explain im afraid,
>
>colm
>
>-----Original Message-----
>From: omid omoomi [mailto:oomoomi@hotmail.com]
>Sent: 14 December 2001 10:23
>To: colm.ennis@eircom.net; fiol@w3ping.com
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>hi,
>have you tried your queries with out order by clause?
>That might be significant.
>
>regards
>Omid Omoomi
>

_________________________________________________________________
Join the world�s largest e-mail service with MSN Hotmail.
http://www.hotmail.com