Re: slow queries on large syslog table - Mailing list pgsql-general
From | colm ennis |
---|---|
Subject | Re: slow queries on large syslog table |
Date | |
Msg-id | DKEIIKIBPBFHLPDHMIMGIECDCIAA.colm.ennis@eircom.net Whole thread Raw |
In response to | Re: slow queries on large syslog table ("omid omoomi" <oomoomi@hotmail.com>) |
List | pgsql-general |
hi omid, i have and it makes little difference to the query time or to the explain im afraid, colm -----Original Message----- From: omid omoomi [mailto:oomoomi@hotmail.com] Sent: 14 December 2001 10:23 To: colm.ennis@eircom.net; fiol@w3ping.com Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] slow queries on large syslog table hi, have you tried your queries with out order by clause? That might be significant. regards Omid Omoomi >From: "colm ennis" <colm.ennis@eircom.net> >To: Antonio Fiol Bonnin <fiol@w3ping.com> >CC: "PostgreSQL General Mailing list" <pgsql-general@postgresql.org> >Subject: Re: [GENERAL] slow queries on large syslog table >Date: Thu, 13 Dec 2001 23:58:32 -0000 > >hi all, > >thanks for your help, its comforting but also kinda scary to know >im not the only one whos having trouble! > >in response to questions.... > >as i mentioned before, the syslog_table is currently holds about >1.7 million rows and is constantly slowly growing, the hostid_table >and ciscomdgid_table each hold about 80 rows. > >the number of rows returned when i ran the query below : > SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3)) >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 >was 19 in all cases, so i guess? the row estimations are woefully >inaccurate. > >i ran a vacuum analyse a few minutes prior to trying these queries. > >my postgresql is version 7.1.3, is running on freebsd 4.3 and is the >standard ports install. > >with regard to resources, heres the output of top mid select : > last pid: 77402; load averages: 0.17, 0.08, 0.03 >up 87+05:05:42 23:36:25 > 48 processes: 2 running, 46 sleeping > CPU states: 11.6% user, 0.0% nice, 5.4% system, 0.0% interrupt, 82.9% >idle > Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free > Swap: > > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND > 77400 pgsql 2 0 5956K 3568K RUN 0:04 21.08% 12.94% postgres > 77399 www 2 0 7896K 6776K select 0:01 2.32% 1.46% perl > 77257 pgsql 2 0 7292K 4832K sbwait 0:04 0.05% 0.05% postgres > 65374 root 10 0 3440K 2696K nanslp 41:00 0.00% 0.00% perl > 74942 pgsql 2 0 8048K 5876K sbwait 8:46 0.00% 0.00% postgres > 75116 root 2 0 2148K 1124K poll 1:30 0.00% 0.00% syslogd >(hmm ...seems like no swap device is configured, but theres loads of >inactive pages anyway i guess) > >memory : > hw.physmem: 264351744 >cpu : > CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU) >disk : > 1 x 9gig scsi > >im not using the -B option so i guess im using the3 default number/size >buffers. > >hope this helps! > >thanks again for all your help a i am completely clueless! > >colm ennis > >-----Original Message----- >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com] >Sent: 13 December 2001 18:14 >To: colm ennis >Cc: PostgreSQL General Mailing list >Subject: Re: [GENERAL] slow queries on large syslog table > > >Are the rows estimations "real"? > > >colm ennis wrote: > > >hi antonio, > > > >thanks for your advice. > > > >ive tried a lot of different index combinations, with extremely variable > >results, > >for instance : > > query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid >IN > >(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT >1000 > > matching messages - 19 > > > >with original indexes : > > query time(s) - 225 > > explain - Limit (cost=0.00..34559.46 rows=1000 width=24) > > -> Index Scan Backward using syslog_table_stimestamp_index on > >syslog_table (cost=0.00..577149.86 rows=16700 width=24) > > > >with antonios index : > > create index syslog_table_stimestamp_shostid_sciscomsgid_index on > >syslog_table (stimestamp, shostid, sciscomsgid); > > query time(s) - 174 > > explain - Limit (cost=0.00..34329.14 rows=1000 width=24) > > -> Index Scan Backward using syslog_table_st_sh_sc_index on >syslog_table > >(cost=0.00..580639.57 rows=16914 width=24) > > > >with NO! index : > > query time(s) - 77 > > explain - Limit (cost=73979.79..73979.79 rows=1000 width=24) > > -> Sort (cost=73979.79..73979.79 rows=16905 width=24) > > -> Seq Scan on syslog_table (cost=0.00..72591.62 rows=16905 > >width=24) > > > >i got similarily confusing results from other queries. > > > >it occured to me that that the index antonio suggests is going to be huge > >because > >of the per second timestamp. > > > >i thought about what you said about deciding how I would search for data. > > > >for the query above : > > SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN >(23,3)) > >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000 > >i would lookup shostid and sciscomsgid in a combined index, and then load > >the > >indexed rows in syslog_table. there are about 80 different shostids and > >sciscomsgids, so i guess the max size of this index would be 1600 rows. > > > >for other queries i would use an shostid index or sciscomsgid if just one >of > >these > >fields appeared in the select, avoiding the stimestamp at all cost >because > >its index > >will be huge. > > > >so to test if this was any good i created the combined index : > > create index syslog_table_sh_sc_index on syslog_table (shostid, > >sciscomsgid); > >but using explain found it isnt being used? : > > explain - Limit (cost=74018.18..74018.18 rows=1000 width=24) > > -> Sort (cost=74018.18..74018.18 rows=16914 width=24) > > -> Seq Scan on syslog_table (cost=0.00..72629.33 rows=16914 > >width=24) > > > >i still dont understand how to use indexes to increase the speed of >queries. > > > >thanks for your help so far but i still feel lost, > > > >colm ennis > > > >-----Original Message----- > >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com] > >Sent: 13 December 2001 15:29 > >To: colm ennis; PostgreSQL General Mailing list > >Subject: Re: [GENERAL] slow queries on large syslog table > > > > > >> > >> > >>ive also created a web interface for selecting syslogs based on optional > >>combinations of timestamp, hostname and ciscomsg. > >> > > > >Combinations is the *magic* word. > > > >>to speed queries i created some indexes on syslog_table : > >> create index syslog_table_stimestamp_index on syslog_table >(stimestamp); > >> create index syslog_table_shostid_index on syslog_table (shostid); > >> create index syslog_table_sciscomsgid_index on syslog_table >(sciscomsgid); > >> create index syslog_table_shostid_sciscomsgid_index on syslog_table > >>(shostid,sciscomsgid); > >> > >Most of them are of no use. That's what you observed... > > > >I bet you will get much better perfs with: > > > >create index syslog_table_stimestamp_shostid_sciscomsg_index on >syslog_table > >(stimestamt, shostid, sciscomsg); > > > >You can try other combinations, but the one I suggested should be of use > >in case you use all three on the query, (or even if you use only the > >first, or the first two, though not sure about this last part, in > >parentheses). > > > > > >Believe me. Creating an index on a large table is of no use, unless it > >is the right one. As a rule of thumb, include in the index as many of > >the SELECTIVE columns present in the WHERE clause as you can. > > > >I am not sure of the selectivity of your columns (never used that > >particular structure). There should be some information about that on > >some of the system tables. However, I do not know in which, or how to > >get that info. > > > >As a second rule of thumb, think how YOU would search for the data you > >need if it was written on a paper book, and especially, how you would > >like to find the book ordered. For example, if you were to look the > >address corresponding to a phone number, you would like to find the data > >ordered by phone number, and not by name. Name is not of any use to you. > > > >For selectivity, think of finding the phone numbers of all people that > >live at number 5, but of any street, and whose first name is Peter. > > > >Neither "Peter" nor "5" are REALLY useful informations to perform your > >search. And even, Peter is more useful than 5. > > > >HTH, > > > >Antonio > > > > > > > >---------------------------(end of broadcast)--------------------------- > >TIP 6: Have you searched our list archives? > > > >http://archives.postgresql.org > > > >. > > > > > > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.
pgsql-general by date: