Re: slow queries on large syslog table - Mailing list pgsql-general
From | Antonio Fiol Bonnín |
---|---|
Subject | Re: slow queries on large syslog table |
Date | |
Msg-id | 3C18EFCE.9060706@w3ping.com Whole thread Raw |
In response to | Re: slow queries on large syslog table ("colm ennis" <colm.ennis@eircom.net>) |
Responses |
Re: slow queries on large syslog table
|
List | pgsql-general |
Are the rows estimations "real"? colm ennis wrote: >hi antonio, > >thanks for your advice. > >ive tried a lot of different index combinations, with extremely variable >results, >for instance : > query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN >(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 > matching messages - 19 > >with original indexes : > query time(s) - 225 > explain - Limit (cost=0.00..34559.46 rows=1000 width=24) > -> Index Scan Backward using syslog_table_stimestamp_index on >syslog_table (cost=0.00..577149.86 rows=16700 width=24) > >with antonios index : > create index syslog_table_stimestamp_shostid_sciscomsgid_index on >syslog_table (stimestamp, shostid, sciscomsgid); > query time(s) - 174 > explain - Limit (cost=0.00..34329.14 rows=1000 width=24) > -> Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table >(cost=0.00..580639.57 rows=16914 width=24) > >with NO! index : > query time(s) - 77 > explain - Limit (cost=73979.79..73979.79 rows=1000 width=24) > -> Sort (cost=73979.79..73979.79 rows=16905 width=24) > -> Seq Scan on syslog_table (cost=0.00..72591.62 rows=16905 >width=24) > >i got similarily confusing results from other queries. > >it occured to me that that the index antonio suggests is going to be huge >because >of the per second timestamp. > >i thought about what you said about deciding how I would search for data. > >for the query above : > SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3)) >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 >i would lookup shostid and sciscomsgid in a combined index, and then load >the >indexed rows in syslog_table. there are about 80 different shostids and >sciscomsgids, so i guess the max size of this index would be 1600 rows. > >for other queries i would use an shostid index or sciscomsgid if just one of >these >fields appeared in the select, avoiding the stimestamp at all cost because >its index >will be huge. > >so to test if this was any good i created the combined index : > create index syslog_table_sh_sc_index on syslog_table (shostid, >sciscomsgid); >but using explain found it isnt being used? : > explain - Limit (cost=74018.18..74018.18 rows=1000 width=24) > -> Sort (cost=74018.18..74018.18 rows=16914 width=24) > -> Seq Scan on syslog_table (cost=0.00..72629.33 rows=16914 >width=24) > >i still dont understand how to use indexes to increase the speed of queries. > >thanks for your help so far but i still feel lost, > >colm ennis > >-----Original Message----- >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com] >Sent: 13 December 2001 15:29 >To: colm ennis; PostgreSQL General Mailing list >Subject: Re: [GENERAL] slow queries on large syslog table > > >> >> >>ive also created a web interface for selecting syslogs based on optional >>combinations of timestamp, hostname and ciscomsg. >> > >Combinations is the *magic* word. > >>to speed queries i created some indexes on syslog_table : >> create index syslog_table_stimestamp_index on syslog_table (stimestamp); >> create index syslog_table_shostid_index on syslog_table (shostid); >> create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid); >> create index syslog_table_shostid_sciscomsgid_index on syslog_table >>(shostid,sciscomsgid); >> >Most of them are of no use. That's what you observed... > >I bet you will get much better perfs with: > >create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table >(stimestamt, shostid, sciscomsg); > >You can try other combinations, but the one I suggested should be of use >in case you use all three on the query, (or even if you use only the >first, or the first two, though not sure about this last part, in >parentheses). > > >Believe me. Creating an index on a large table is of no use, unless it >is the right one. As a rule of thumb, include in the index as many of >the SELECTIVE columns present in the WHERE clause as you can. > >I am not sure of the selectivity of your columns (never used that >particular structure). There should be some information about that on >some of the system tables. However, I do not know in which, or how to >get that info. > >As a second rule of thumb, think how YOU would search for the data you >need if it was written on a paper book, and especially, how you would >like to find the book ordered. For example, if you were to look the >address corresponding to a phone number, you would like to find the data >ordered by phone number, and not by name. Name is not of any use to you. > >For selectivity, think of finding the phone numbers of all people that >live at number 5, but of any street, and whose first name is Peter. > >Neither "Peter" nor "5" are REALLY useful informations to perform your >search. And even, Peter is more useful than 5. > >HTH, > >Antonio > > > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > >. >
pgsql-general by date: