Home > mailing lists

Re: slow queries on large syslog table - Mailing list pgsql-general

From	colm ennis
Subject	Re: slow queries on large syslog table
Date	December 13, 2001 12:36:24
Msg-id	DKEIIKIBPBFHLPDHMIMGAEAACIAA.colm.ennis@eircom.net Whole thread Raw
In response to	Re: slow queries on large syslog table (Antonio Fiol Bonnín <fiol@w3ping.com>)
Responses	Re: slow queries on large syslog table
List	pgsql-general

Tree view

hi antonio,

thanks for your advice.

ive tried a lot of different index combinations, with extremely variable
results,
for instance :
    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
    matching messages - 19

with original indexes :
    query time(s) - 225
    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_stimestamp_index on
syslog_table  (cost=0.00..577149.86 rows=16700 width=24)

with antonios index :
    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
syslog_table (stimestamp, shostid, sciscomsgid);
    query time(s) - 174
    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
  ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
(cost=0.00..580639.57 rows=16914 width=24)

with NO! index :
    query time(s) - 77
    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
width=24)

i got similarily confusing results from other queries.

it occured to me that that the index antonio suggests is going to be huge
because
of the per second timestamp.

i thought about what you said about deciding how I would search for data.

for the query above :
    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
i would lookup shostid and sciscomsgid in a combined index, and then load
the
indexed rows in syslog_table.  there are about 80 different shostids and
sciscomsgids, so i guess the max size of this index would be 1600 rows.

for other queries i would use an shostid index or sciscomsgid if just one of
these
fields appeared in the select, avoiding the stimestamp at all cost because
its index
will be huge.

so to test if this was any good i created the combined index :
    create index syslog_table_sh_sc_index on syslog_table (shostid,
sciscomsgid);
but using explain found it isnt being used? :
    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
width=24)

i still dont understand how to use indexes to increase the speed of queries.

thanks for your help so far but i still feel lost,

colm ennis

-----Original Message-----
From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
Sent: 13 December 2001 15:29
To: colm ennis; PostgreSQL General Mailing list
Subject: Re: [GENERAL] slow queries on large syslog table


>
>
>
>ive also created a web interface for selecting syslogs based on optional
>combinations of timestamp, hostname and ciscomsg.
>

Combinations is the *magic* word.

>to speed queries i created some indexes on syslog_table :
>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>    create index syslog_table_shostid_index on syslog_table (shostid);
>    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>(shostid,sciscomsgid);
>
Most of them are of no use. That's what you observed...

I bet you will get much better perfs with:

create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table
(stimestamt, shostid, sciscomsg);

You can try other combinations, but the one I suggested should be of use
in case you use all three on the query, (or even if you use only the
first, or the first two, though not sure about this last part, in
parentheses).


Believe me. Creating an index on a large table is of no use, unless it
is the right one. As a rule of thumb, include in the index as many of
the SELECTIVE columns present in the WHERE clause as you can.

I am not sure of the selectivity of your columns (never used that
particular structure). There should be some information about that on
some of the system tables. However, I do not know in which, or how to
get that info.

As a second rule of thumb, think how YOU would search for the data you
need if it was written on a paper book, and especially, how you would
like to find the book ordered. For example, if you were to look the
address corresponding to a phone number, you would like to find the data
ordered by phone number, and not by name. Name is not of any use to you.

For selectivity, think of finding the phone numbers of all people that
live at number 5, but of any street, and whose first name is Peter.

Neither "Peter" nor "5" are REALLY useful informations to perform your
search. And even, Peter is more useful than 5.

HTH,

Antonio

pgsql-general by date:

From: Luis Amigo
Date: 13 December 2001, 12:08:38
Subject: Re: excessive performance difference

From: Chris Albertson
Date: 13 December 2001, 12:39:44
Subject: Re: Can I call unix/linux commands within plsql?

Re: slow queries on large syslog table - Mailing list pgsql-general

Previous

Next