Re: slow queries on large syslog table - Mailing list pgsql-general

From colm ennis
Subject Re: slow queries on large syslog table
Date
Msg-id DKEIIKIBPBFHLPDHMIMGIECDCIAA.colm.ennis@eircom.net
Whole thread Raw
In response to Re: slow queries on large syslog table  ("omid omoomi" <oomoomi@hotmail.com>)
List pgsql-general
hi omid,

i have and it makes little difference to the query time or to the
explain im afraid,

colm

-----Original Message-----
From: omid omoomi [mailto:oomoomi@hotmail.com]
Sent: 14 December 2001 10:23
To: colm.ennis@eircom.net; fiol@w3ping.com
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] slow queries on large syslog table


hi,
have you tried your queries with out order by clause?
That might be significant.

regards
Omid Omoomi

>From: "colm ennis" <colm.ennis@eircom.net>
>To: Antonio Fiol Bonnin <fiol@w3ping.com>
>CC: "PostgreSQL General Mailing list" <pgsql-general@postgresql.org>
>Subject: Re: [GENERAL] slow queries on large syslog table
>Date: Thu, 13 Dec 2001 23:58:32 -0000
>
>hi all,
>
>thanks for your help, its comforting but also kinda scary to know
>im not the only one whos having trouble!
>
>in response to questions....
>
>as i mentioned before, the syslog_table is currently holds about
>1.7 million rows and is constantly slowly growing, the hostid_table
>and ciscomdgid_table each hold about 80 rows.
>
>the number of rows returned when i ran the query below :
>    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>was 19 in all cases, so i guess? the row estimations are woefully
>inaccurate.
>
>i ran a vacuum analyse a few minutes prior to trying these queries.
>
>my postgresql is version 7.1.3, is running on freebsd 4.3 and is the
>standard ports install.
>
>with regard to resources, heres the output of top mid select :
>    last pid: 77402;  load averages:  0.17,  0.08,  0.03
>up 87+05:05:42  23:36:25
>    48 processes:  2 running, 46 sleeping
>    CPU states: 11.6% user,  0.0% nice,  5.4% system,  0.0% interrupt, 82.9%
>idle
>    Mem: 47M Active, 139M Inact, 52M Wired, 8784K Cache, 35M Buf, 656K Free
>    Swap:
>
>      PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>    77400 pgsql      2   0  5956K  3568K RUN      0:04 21.08% 12.94% postgres
>    77399 www        2   0  7896K  6776K select   0:01  2.32%  1.46% perl
>    77257 pgsql      2   0  7292K  4832K sbwait   0:04  0.05%  0.05% postgres
>    65374 root      10   0  3440K  2696K nanslp  41:00  0.00%  0.00% perl
>    74942 pgsql      2   0  8048K  5876K sbwait   8:46  0.00%  0.00% postgres
>    75116 root       2   0  2148K  1124K poll     1:30  0.00%  0.00% syslogd
>(hmm ...seems like no swap device is configured, but theres loads of
>inactive pages anyway i guess)
>
>memory :
>    hw.physmem: 264351744
>cpu :
>    CPU: Pentium III/Pentium III Xeon/Celeron (547.18-MHz 686-class CPU)
>disk :
>    1 x 9gig scsi
>
>im not using the -B option so i guess im using the3 default number/size
>buffers.
>
>hope this helps!
>
>thanks again for all your help a i am completely clueless!
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
>Sent: 13 December 2001 18:14
>To: colm ennis
>Cc: PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>Are the rows estimations "real"?
>
>
>colm ennis wrote:
>
> >hi antonio,
> >
> >thanks for your advice.
> >
> >ive tried a lot of different index combinations, with extremely variable
> >results,
> >for instance :
> >    query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid
>IN
> >(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT
>1000
> >    matching messages - 19
> >
> >with original indexes :
> >    query time(s) - 225
> >    explain - Limit  (cost=0.00..34559.46 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_stimestamp_index on
> >syslog_table  (cost=0.00..577149.86 rows=16700 width=24)
> >
> >with antonios index :
> >    create index syslog_table_stimestamp_shostid_sciscomsgid_index on
> >syslog_table (stimestamp, shostid, sciscomsgid);
> >    query time(s) - 174
> >    explain - Limit  (cost=0.00..34329.14 rows=1000 width=24)
> >  ->  Index Scan Backward using syslog_table_st_sh_sc_index on
>syslog_table
> >(cost=0.00..580639.57 rows=16914 width=24)
> >
> >with NO! index :
> >    query time(s) - 77
> >    explain - Limit  (cost=73979.79..73979.79 rows=1000 width=24)
> >  ->  Sort  (cost=73979.79..73979.79 rows=16905 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72591.62 rows=16905
> >width=24)
> >
> >i got similarily confusing results from other queries.
> >
> >it occured to me that that the index antonio suggests is going to be huge
> >because
> >of the per second timestamp.
> >
> >i thought about what you said about deciding how I would search for data.
> >
> >for the query above :
> >    SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3))
> >AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
> >i would lookup shostid and sciscomsgid in a combined index, and then load
> >the
> >indexed rows in syslog_table.  there are about 80 different shostids and
> >sciscomsgids, so i guess the max size of this index would be 1600 rows.
> >
> >for other queries i would use an shostid index or sciscomsgid if just one
>of
> >these
> >fields appeared in the select, avoiding the stimestamp at all cost
>because
> >its index
> >will be huge.
> >
> >so to test if this was any good i created the combined index :
> >    create index syslog_table_sh_sc_index on syslog_table (shostid,
> >sciscomsgid);
> >but using explain found it isnt being used? :
> >    explain - Limit  (cost=74018.18..74018.18 rows=1000 width=24)
> >  ->  Sort  (cost=74018.18..74018.18 rows=16914 width=24)
> >        ->  Seq Scan on syslog_table  (cost=0.00..72629.33 rows=16914
> >width=24)
> >
> >i still dont understand how to use indexes to increase the speed of
>queries.
> >
> >thanks for your help so far but i still feel lost,
> >
> >colm ennis
> >
> >-----Original Message-----
> >From: Antonio Fiol Bonnin [mailto:fiol@w3ping.com]
> >Sent: 13 December 2001 15:29
> >To: colm ennis; PostgreSQL General Mailing list
> >Subject: Re: [GENERAL] slow queries on large syslog table
> >
> >
> >>
> >>
> >>ive also created a web interface for selecting syslogs based on optional
> >>combinations of timestamp, hostname and ciscomsg.
> >>
> >
> >Combinations is the *magic* word.
> >
> >>to speed queries i created some indexes on syslog_table :
> >>    create index syslog_table_stimestamp_index on syslog_table
>(stimestamp);
> >>    create index syslog_table_shostid_index on syslog_table (shostid);
> >>    create index syslog_table_sciscomsgid_index on syslog_table
>(sciscomsgid);
> >>    create index syslog_table_shostid_sciscomsgid_index on syslog_table
> >>(shostid,sciscomsgid);
> >>
> >Most of them are of no use. That's what you observed...
> >
> >I bet you will get much better perfs with:
> >
> >create index syslog_table_stimestamp_shostid_sciscomsg_index on
>syslog_table
> >(stimestamt, shostid, sciscomsg);
> >
> >You can try other combinations, but the one I suggested should be of use
> >in case you use all three on the query, (or even if you use only the
> >first, or the first two, though not sure about this last part, in
> >parentheses).
> >
> >
> >Believe me. Creating an index on a large table is of no use, unless it
> >is the right one. As a rule of thumb, include in the index as many of
> >the SELECTIVE columns present in the WHERE clause as you can.
> >
> >I am not sure of the selectivity of your columns (never used that
> >particular structure). There should be some information about that on
> >some of the system tables. However, I do not know in which, or how to
> >get that info.
> >
> >As a second rule of thumb, think how YOU would search for the data you
> >need if it was written on a paper book, and especially, how you would
> >like to find the book ordered. For example, if you were to look the
> >address corresponding to a phone number, you would like to find the data
> >ordered by phone number, and not by name. Name is not of any use to you.
> >
> >For selectivity, think of finding the phone numbers of all people that
> >live at number 5, but of any street, and whose first name is Peter.
> >
> >Neither "Peter" nor "5" are REALLY useful informations to perform your
> >search. And even, Peter is more useful than 5.
> >
> >HTH,
> >
> >Antonio
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 6: Have you searched our list archives?
> >
> >http://archives.postgresql.org
> >
> >.
> >
>
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html




_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.



pgsql-general by date:

Previous
From: J Smith
Date:
Subject: Re: Correction: Working on "SELECT * WHERE numeric_col = 2001.2" problem?
Next
From: Joe Koenig
Date:
Subject: Re: How to increase shared mem for PostgreSQL on FreeBSD