Re: slow queries on large syslog table - Mailing list pgsql-general

From Antonio Fiol Bonnín
Subject Re: slow queries on large syslog table
Date
Msg-id 3C18EFCE.9060706@w3ping.com
Whole thread Raw
In response to Re: slow queries on large syslog table  ("colm ennis" <colm.ennis@eircom.net>)
Responses Re: slow queries on large syslog table
List pgsql-general
Are the rows estimations "real"?


colm ennis wrote:

>hi antonio,
>
>thanks for your advice.
>
>ive tried a lot of different index combinations, with extremely variable
>results,
>for instance :
>    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>    matching messages - 19
>
>with original indexes :
>    query time(s) - 225
>    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_stimestamp_index on
>syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
>
>with antonios index :
>    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>syslog_table (stimestamp, shostid, sciscomsgid);
>    query time(s) - 174
>    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
>  ->  Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>(cost=0.00..580639.57 rows=16914 width=24)
>
>with NO! index :
>    query time(s) - 77
>    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
>  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
>width=24)
>
>i got similarily confusing results from other queries.
>
>it occured to me that that the index antonio suggests is going to be huge
>because
>of the per second timestamp.
>
>i thought about what you said about deciding how I would search for data.
>
>for the query above :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>i would lookup shostid and sciscomsgid in a combined index, and then load
>the
>indexed rows in syslog_table.  there are about 80 different shostids and
>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>
>for other queries i would use an shostid index or sciscomsgid if just one of
>these
>fields appeared in the select, avoiding the stimestamp at all cost because
>its index
>will be huge.
>
>so to test if this was any good i created the combined index :
>    create index syslog_table_sh_sc_index on syslog_table (shostid,
>sciscomsgid);
>but using explain found it isnt being used? :
>    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
>  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
>        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
>width=24)
>
>i still dont understand how to use indexes to increase the speed of queries.
>
>thanks for your help so far but i still feel lost,
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 15:29
>To: colm ennis; PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>>
>>
>>ive also created a web interface for selecting syslogs based on optional
>>combinations of timestamp, hostname and ciscomsg.
>>
>
>Combinations is the *magic* word.
>
>>to speed queries i created some indexes on syslog_table :
>>    create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>>    create index syslog_table_shostid_index on syslog_table (shostid);
>>    create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>(shostid,sciscomsgid);
>>
>Most of them are of no use. That's what you observed...
>
>I bet you will get much better perfs with:
>
>create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table
>(stimestamt, shostid, sciscomsg);
>
>You can try other combinations, but the one I suggested should be of use
>in case you use all three on the query, (or even if you use only the
>first, or the first two, though not sure about this last part, in
>parentheses).
>
>
>Believe me. Creating an index on a large table is of no use, unless it
>is the right one. As a rule of thumb, include in the index as many of
>the SELECTIVE columns present in the WHERE clause as you can.
>
>I am not sure of the selectivity of your columns (never used that
>particular structure). There should be some information about that on
>some of the system tables. However, I do not know in which, or how to
>get that info.
>
>As a second rule of thumb, think how YOU would search for the data you
>need if it was written on a paper book, and especially, how you would
>like to find the book ordered. For example, if you were to look the
>address corresponding to a phone number, you would like to find the data
>ordered by phone number, and not by name. Name is not of any use to you.
>
>For selectivity, think of finding the phone numbers of all people that
>live at number 5, but of any street, and whose first name is Peter.
>
>Neither "Peter" nor "5" are REALLY useful informations to perform your
>search. And even, Peter is more useful than 5.
>
>HTH,
>
>Antonio
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>
>.
>




pgsql-general by date:

Previous
From: Jason Earl
Date:
Subject: Re: Money reformatting
Next
From: Jason Earl
Date:
Subject: Re: Can I call unix/linux commands within plsql?