Thread: Weird performance drop after VACUUM

Weird performance drop after VACUUM

From
Ümit Öztosun
Date:
Hello,

We are using PostgreSQL for our business application. Recently, during
testing of our application with large volumes of data, we faced a weird
problem. Our query performance dropped *dramatically* after "VACUUM FULL
ANALYZE" command. We have encountered a similar problem listed on
mailing list archives, but the submitter solved his problem by rewriting
his query, which is unfortunatelly very hard for us.

I am attaching two EXPLAIN ANALYZE outputs, first one is just before the
VACUUM FULL ANALYZE command and the other is the one after. Also
attached is the SQL query, which is simplified to clearify the problem.
In the example query time increases from 1.8 second to > 4.0 secons. The
difference for the complete query is much bigger, query time increases
from 7.8 seconds to > 110 seconds.

Any help is appreciated, we were unable to identify what causes the
query planner to choose a different/poor performing plan.

Notes:
Our production platform is Ubuntu Linux Hoary on i386, PostgreSQL 8.0.3,
compiled from sources. Same tests were carried on Windows XP
Professional and PostgreSQL 8.0.1 with similar results. The queries use
little IO, high CPU. The largest table involved in the sample query has
about 10000 rows. Indexes are used intensively, some tables use > 4
indexes.

Best regards,
Umit Oztosun


Attachment

Re: Weird performance drop after VACUUM

From
asif ali
Date:
Hi,
I have the same issue. After doing "VACCUME ANALYZE"
performance of the query dropped.

Here is the query
explain select * from  conversion_table c where
c.conversion_date BETWEEN '2005-06-07' and
'2005-08-17'

Before "VACCUME ANALYZE"

"Index Scan using conversion_table_pk on
keyword_conversion_table c  (cost=0.00..18599.25
rows=4986 width=95)"
"  Index Cond: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"


After  "VACCUME ANALYZE"


"Seq Scan on conversion_table c  (cost=0.00..29990.83
rows=1094820 width=66)"
"  Filter: ((conversion_date >= '2005-06-07'::date)
AND (conversion_date <= '2005-08-17'::date))"


I dont know why system is doing "Seq scan" now.

Thanks

asif ali







--- �mit �ztosun <umit@likyabilisim.com> wrote:

> Hello,
>
> We are using PostgreSQL for our business
> application. Recently, during
> testing of our application with large volumes of
> data, we faced a weird
> problem. Our query performance dropped
> *dramatically* after "VACUUM FULL
> ANALYZE" command. We have encountered a similar
> problem listed on
> mailing list archives, but the submitter solved his
> problem by rewriting
> his query, which is unfortunatelly very hard for us.
>
> I am attaching two EXPLAIN ANALYZE outputs, first
> one is just before the
> VACUUM FULL ANALYZE command and the other is the one
> after. Also
> attached is the SQL query, which is simplified to
> clearify the problem.
> In the example query time increases from 1.8 second
> to > 4.0 secons. The
> difference for the complete query is much bigger,
> query time increases
> from 7.8 seconds to > 110 seconds.
>
> Any help is appreciated, we were unable to identify
> what causes the
> query planner to choose a different/poor performing
> plan.
>
> Notes:
> Our production platform is Ubuntu Linux Hoary on
> i386, PostgreSQL 8.0.3,
> compiled from sources. Same tests were carried on
> Windows XP
> Professional and PostgreSQL 8.0.1 with similar
> results. The queries use
> little IO, high CPU. The largest table involved in
> the sample query has
> about 10000 rows. Indexes are used intensively, some
> tables use > 4
> indexes.
>
> Best regards,
> Umit Oztosun
>
> > SELECT * FROM (
>     SELECT
>         COALESCE (
>             (SELECT COALESCE (sum(irskal.anamiktar),
> 0)
>                 * (SELECT
>                     birim.fiyat2 * (SELECT kur1
>                                     FROM
> sis_doviz_kuru kur
>                                     WHERE
> birim._key_sis_doviz2 = kur._key_sis_doviz
>                                     ORDER BY tarih
> desc
>                                     LIMIT 1)
>                     FROM scf_stokkart_birimleri
> birim
>                     WHERE _key_scf_stokkart =
> stok._key
>                     AND anabirim = '1'
>                   )
>                  FROM scf_irsaliye irs,
> scf_irsaliye_kalemi irskal
>                  WHERE irskal._key_kalemturu =
> stok._key
>                  AND irskal._key_scf_irsaliye =
> irs._key
>                  AND irs.karsifirma = 'KENDI'
>                  AND (irs.turu='MAI' OR
> irs.turu='KGI' OR irs.turu='PS' OR irs.turu='TS' OR
> irs.turu='KC' OR irs.turu='KCO')
>                  AND ( irs._key_sis_depo_dest =
> '$$$$0000003l$1$$'  OR  irs._key_sis_depo_dest =
> '$$$$00000048$1$$'  OR  irs._key_sis_depo_dest =
> '$$$$0000004b$1$$'  OR  irs._key_sis_depo_dest =
> '$$$$0000004d$1$$' )
>                  AND ((irskal._key LIKE '0000%' OR
> irskal._key LIKE '0101%' OR irskal._key LIKE '$$%'))
>                  AND irs.tarih <= '2005-08-26'
>             ), 0
>         ) as arti_fiili_irs_karsifirma,
>         stok.*
>         FROM scf_stokkart stok
> ) AS _SWT WHERE (_key LIKE '00%' OR _key LIKE '01%'
> OR _key LIKE '$$%') ORDER BY _key desc
> > Before VACUUM FULL ANALYZE - Short Query
> ---------------------------------------
> Sort  (cost=9094.31..9094.40 rows=37 width=817)
> (actual time=1852.799..1877.738 rows=10000 loops=1)
>   Sort Key: stok._key
>   ->  Seq Scan on scf_stokkart stok
> (cost=0.00..9093.34 rows=37 width=817) (actual
> time=8.670..1575.586 rows=10000 loops=1)
>         Filter: (((_key)::text ~~ '00%'::text) OR
> ((_key)::text ~~ '01%'::text) OR ((_key)::text ~~
> '$$%'::text))
>         SubPlan
>           ->  Aggregate  (cost=237.29..237.29 rows=1
> width=16) (actual time=0.136..0.138 rows=1
> loops=10000)
>                 InitPlan
>                   ->  Index Scan using
> scf_stokkart_birimleri_key_scf_stokkart_idx on
> scf_stokkart_birimleri birim  (cost=0.00..209.59
> rows=1 width=58) (actual time=0.088..0.093 rows=1
> loops=10000)
>                         Index Cond:
> ((_key_scf_stokkart)::text = ($1)::text)
>                         Filter: (anabirim =
> '1'::bpchar)
>                         SubPlan
>                           ->  Limit
> (cost=9.31..9.31 rows=1 width=17) (actual
> time=0.046..0.048 rows=1 loops=10000)
>                                 ->  Sort
> (cost=9.31..9.31 rows=2 width=17) (actual
> time=0.041..0.041 rows=1 loops=10000)
>                                       Sort Key:
> tarih
>                                       ->  Index Scan
> using sis_doviz_kuru_key_sis_doviz_idx on
> sis_doviz_kuru kur  (cost=0.00..9.30 rows=2
> width=17) (actual time=0.018..0.029 rows=2
> loops=10000)
>                                             Index
> Cond: (($0)::text = (_key_sis_doviz)::text)
>                 ->  Nested Loop  (cost=0.00..27.69
> rows=1 width=16) (actual time=0.033..0.033 rows=0
> loops=10000)
>                       ->  Index Scan using
> scf_irsaliye_kalemi_key_kalemturu_idx on
> scf_irsaliye_kalemi irskal  (cost=0.00..21.75 rows=1
> width=58) (actual time=0.017..0.020 rows=0
> loops=10000)
>                             Index Cond:
> ((_key_kalemturu)::text = ($1)::text)
>                             Filter: (((_key)::text
> ~~ '0000%'::text) OR ((_key)::text ~~ '0101%'::text)
> OR ((_key)::text ~~ '$$%'::text))
>                       ->  Index Scan using
> scf_irsaliye_pkey on scf_irsaliye irs
> (cost=0.00..5.94 rows=1 width=42) (actual
> time=0.021..0.021 rows=0 loops=3000)
>                             Index Cond:
> (("outer"._key_scf_irsaliye)::text =
> (irs._key)::text)
>                             Filter:
> (((karsifirma)::text = 'KENDI'::text) AND
> (((turu)::text = 'MAI'::text) OR ((turu)::text =
> 'KGI'::text) OR ((turu)::text = 'PS'::text) OR
> ((turu)::text = 'TS'::text) OR ((turu)::text =
> 'KC'::text) OR ((turu)::text = 'KCO'::text)) AND
> (((_key_sis_depo_dest)::text =
> '$$$$0000003l$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$00000048$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$0000004b$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$0000004d$1$$'::text)) AND (tarih <=
> '2005-08-26'::date))
> Total runtime: 1899.533 ms
> > After VACUUM FULL ANALYZE - Short Query
> ---------------------------------------
> Index Scan Backward using scf_stokkart_pkey on
> scf_stokkart stok  (cost=0.00..392045.63 rows=9998
> width=166) (actual time=0.661..4431.568 rows=10000
> loops=1)
>   Filter: (((_key)::text ~~ '00%'::text) OR
> ((_key)::text ~~ '01%'::text) OR ((_key)::text ~~
> '$$%'::text))
>   SubPlan
>     ->  Aggregate  (cost=39.16..39.16 rows=1
> width=10) (actual time=0.416..0.418 rows=1
> loops=10000)
>           InitPlan
>             ->  Index Scan using
> scf_stokkart_birimleri_key_scf_stokkart_idx on
> scf_stokkart_birimleri birim  (cost=0.00..5.25
> rows=2 width=28) (actual time=0.101..0.105 rows=1
> loops=10000)
>                   Index Cond:
> ((_key_scf_stokkart)::text = ($1)::text)
>                   Filter: (anabirim = '1'::bpchar)
>                   SubPlan
>                     ->  Limit  (cost=1.08..1.09
> rows=1 width=15) (actual time=0.048..0.050 rows=1
> loops=10000)
>                           ->  Sort  (cost=1.08..1.09
> rows=2 width=15) (actual time=0.043..0.043 rows=1
> loops=10000)
>                                 Sort Key: tarih
>                                 ->  Seq Scan on
> sis_doviz_kuru kur  (cost=0.00..1.07 rows=2
> width=15) (actual time=0.009..0.026 rows=2
> loops=10000)
>                                       Filter:
> (($0)::text = (_key_sis_doviz)::text)
>           ->  Nested Loop  (cost=0.00..33.90 rows=1
> width=10) (actual time=0.295..0.295 rows=0
> loops=10000)
>                 ->  Seq Scan on scf_irsaliye irs
> (cost=0.00..30.00 rows=1 width=20) (actual
> time=0.290..0.290 rows=0 loops=10000)
>                       Filter: (((karsifirma)::text =
> 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR
> ((turu)::text = 'KGI'::text) OR ((turu)::text =
> 'PS'::text) OR ((turu)::text = 'TS'::text) OR
> ((turu)::text = 'KC'::text) OR ((turu)::text =
> 'KCO'::text)) AND (((_key_sis_depo_dest)::text =
> '$$$$0000003l$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$00000048$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$0000004b$1$$'::text) OR
> ((_key_sis_depo_dest)::text =
> '$$$$0000004d$1$$'::text)) AND (tarih <=
> '2005-08-26'::date))
>                 ->  Index Scan using
> scf_irsaliye_kalemi_key_scf_irsaliye_idx on
> scf_irsaliye_kalemi irskal  (cost=0.00..3.89 rows=1
> width=30) (never executed)
>                       Index Cond:
> ((irskal._key_scf_irsaliye)::text =
> ("outer"._key)::text)
>                       Filter:
> (((_key_kalemturu)::text = ($1)::text) AND
> (((_key)::text ~~ '0000%'::text) OR ((_key)::text ~~
> '0101%'::text) OR ((_key)::text ~~ '$$%'::text)))
> Total runtime: 4456.895 ms
> >
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
>        subscribe-nomail command to
> majordomo@postgresql.org so that your
>        message can get through to the mailing list
> cleanly
>




__________________________________
Yahoo! Mail for Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.
http://mobile.yahoo.com/learn/mail

Re: Weird performance drop after VACUUM

From
Philip Hallstrom
Date:
> Hi,
> I have the same issue. After doing "VACCUME ANALYZE"
> performance of the query dropped.
>
> Here is the query
> explain select * from  conversion_table c where
> c.conversion_date BETWEEN '2005-06-07' and
> '2005-08-17'
>
> Before "VACCUME ANALYZE"
>
> "Index Scan using conversion_table_pk on
> keyword_conversion_table c  (cost=0.00..18599.25
> rows=4986 width=95)"
> "  Index Cond: ((conversion_date >=
> '2005-06-07'::date) AND (conversion_date <=
> '2005-08-17'::date))"
>
>
> After  "VACCUME ANALYZE"
>
>
> "Seq Scan on conversion_table c  (cost=0.00..29990.83
> rows=1094820 width=66)"
> "  Filter: ((conversion_date >= '2005-06-07'::date)
> AND (conversion_date <= '2005-08-17'::date))"
>
>
> I dont know why system is doing "Seq scan" now.

I could be wrong as I'm definitely no expert on reading the output of
EXPLAIN, but it seems to say that prior to VACUUM it was expecting to
retrieve 4986 rows and afterwards expecting to retrieve 1094820 rows.

Which is a pretty big difference.

So maybe the statistics were just really really off prior to vacuuming and
once it did vacuum it realized there would be a lot more matches and since
there were a lot more matches the planner decided to do a seq scan since
it would be quicker overall...

Maybe?  Seems I've heard Tom Lane say something to that affect, although
much more eloquently :-)

-philip

Re: Weird performance drop after VACUUM

From
Michael Fuhr
Date:
On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali wrote:
> I have the same issue. After doing "VACCUME ANALYZE"
> performance of the query dropped.

Your EXPLAIN output doesn't show the actual query times -- could
you post the EXPLAIN ANALYZE output?  That'll also show how accurate
the planner's row count estimates are.

> Before "VACCUME ANALYZE"
>
> "Index Scan using conversion_table_pk on
> keyword_conversion_table c  (cost=0.00..18599.25
> rows=4986 width=95)"
> "  Index Cond: ((conversion_date >=
> '2005-06-07'::date) AND (conversion_date <=
> '2005-08-17'::date))"
>
> After  "VACCUME ANALYZE"
>
> "Seq Scan on conversion_table c  (cost=0.00..29990.83
> rows=1094820 width=66)"
> "  Filter: ((conversion_date >= '2005-06-07'::date)
> AND (conversion_date <= '2005-08-17'::date))"
>
> I dont know why system is doing "Seq scan" now.

Notice the row count estimates: 4986 in the "before" query and
1094820 in the "after" query.  In the latter, the planner thinks
it has to fetch so much of the table that a sequential scan would
be faster than an index scan.  You can see whether that guess is
correct by disabling enable_seqscan to force an index scan.  It
might be useful to see the output of the following:

SET enable_seqscan TO on;
SET enable_indexscan TO off;
EXPLAIN ANALYZE SELECT ...;

SET enable_seqscan TO off;
SET enable_indexscan TO on;
EXPLAIN ANALYZE SELECT ...;

You might also experiment with planner variables like effective_cache_size
and random_page_cost to see how changing them affects the query
plan.  However, be careful of tuning the system based on one query:
make sure adjustments result in reasonable plans for many different
queries.

--
Michael Fuhr

Re: Weird performance drop after VACUUM

From
Tom Lane
Date:
=?ISO-8859-1?Q?=DCmit_=D6ztosun?= <umit@likyabilisim.com> writes:
> We are using PostgreSQL for our business application. Recently, during
> testing of our application with large volumes of data, we faced a weird
> problem. Our query performance dropped *dramatically* after "VACUUM FULL
> ANALYZE" command.

I think the problem is that the planner is underestimating the cost of
evaluating this complicated filter condition:

>                 ->  Seq Scan on scf_irsaliye irs  (cost=0.00..30.00 rows=1 width=20) (actual time=0.290..0.290 rows=0
loops=10000)
>                       Filter: (((karsifirma)::text = 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR
((turu)::text= 'KGI'::text) OR ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text =
'KC'::text)OR ((turu)::text = 'KCO'::text)) AND (((_key_sis_depo_dest)::text = '$$$$0000003l$1$$'::text) OR
((_key_sis_depo_dest)::text= '$$$$00000048$1$$'::text) OR ((_key_sis_depo_dest)::text = '$$$$0000004b$1$$'::text) OR
((_key_sis_depo_dest)::text= '$$$$0000004d$1$$'::text)) AND (tarih <= '2005-08-26'::date)) 

While you could attack that by raising the cpu_operator_cost parameter,
it would also be worth inquiring *why* the condition is so expensive to
evaluate.  I am suspicious that you are running the database in a locale
in which strcoll() is really slow.  Can you run it in C locale instead,
or do you really need locale-aware behavior?  Can you switch to a
different database encoding?  (A single-byte encoding such as Latin1
might be faster than UTF8, for example.)

Another possibility is to take a hard look at whether you can't simplify
the filter condition, but that'd require more knowledge of your
application than I have.

Or you could just play with the order of the filter conditions ... for
example, the date condition at the end is probably far cheaper to test
than the text comparisons, so if that's fairly selective it'd be worth
putting it first.

            regards, tom lane

Re: Weird performance drop after VACUUM

From
asif ali
Date:
Thanks Michael For your reply.

Here is performance on the database on which i did
VACUUM ANALYZE

explain analyze
select   keyword_id
    ,sum(daily_impressions) as daily_impressions
    ,sum(daily_actions)as daily_actions
 from  conversion_table c where    c.conversion_date
BETWEEN '2005-06-07' and '2005-08-17'
    group by keyword_Id

"GroupAggregate  (cost=195623.66..206672.52 rows=20132
width=16) (actual time=8205.283..10139.369 rows=55291
loops=1)"
"  ->  Sort  (cost=195623.66..198360.71 rows=1094820
width=16) (actual time=8205.114..9029.501 rows=863883
loops=1)"
"        Sort Key: keyword_id"
"        ->  Seq Scan on keyword_conversion_table c
(cost=0.00..29990.83 rows=1094820 width=16) (actual
time=0.057..1422.319 rows=863883 loops=1)"
"              Filter: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 14683.617 ms"


Now see if am changing the query and commenting one
column.

explain analyze
select   keyword_id
    ,sum(daily_impressions) as daily_impressions
--    ,sum(daily_actions)as daily_actions
 from  conversion_table c where    c.conversion_date
BETWEEN '2005-06-07' and '2005-08-17'
    group by keyword_Id


"HashAggregate  (cost=27373.51..27373.52 rows=2
width=16) (actual time=3030.386..3127.073 rows=55717
loops=1)"
"  ->  Seq Scan on conversion_table c
(cost=0.00..27336.12 rows=4986 width=16) (actual
time=0.050..1357.164 rows=885493 loops=1)"
"        Filter: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 3159.162 ms"


I noticed "GroupAggregate" changes to "HashAggregate"
and performance from 14 sec to 3 sec.


On the other hand I have another database which I did
not do "VACUUM ANALYZE"  working fine.


explain analyze
select   keyword_id
    ,sum(daily_impressions) as daily_impressions
    ,sum(daily_actions)as daily_actions
 from  conversion_table c where    c.conversion_date
BETWEEN '2005-06-07' and '2005-08-17'
    group by keyword_Id


"HashAggregate  (cost=27373.51..27373.52 rows=2
width=16) (actual time=3024.289..3120.324 rows=55717
loops=1)"
"  ->  Seq Scan on conversion_table c
(cost=0.00..27336.12 rows=4986 width=16) (actual
time=0.047..1352.212 rows=885493 loops=1)"
"        Filter: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 3152.437 ms"


I am new to postgres. Thanks in advance.


asif ali






--- Michael Fuhr <mike@fuhr.org> wrote:

> On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali
> wrote:
> > I have the same issue. After doing "VACCUME
> ANALYZE"
> > performance of the query dropped.
>
> Your EXPLAIN output doesn't show the actual query
> times -- could
> you post the EXPLAIN ANALYZE output?  That'll also
> show how accurate
> the planner's row count estimates are.
>
> > Before "VACCUME ANALYZE"
> >
> > "Index Scan using conversion_table_pk on
> > keyword_conversion_table c  (cost=0.00..18599.25
> > rows=4986 width=95)"
> > "  Index Cond: ((conversion_date >=
> > '2005-06-07'::date) AND (conversion_date <=
> > '2005-08-17'::date))"
> >
> > After  "VACCUME ANALYZE"
> >
> > "Seq Scan on conversion_table c
> (cost=0.00..29990.83
> > rows=1094820 width=66)"
> > "  Filter: ((conversion_date >=
> '2005-06-07'::date)
> > AND (conversion_date <= '2005-08-17'::date))"
> >
> > I dont know why system is doing "Seq scan" now.
>
> Notice the row count estimates: 4986 in the "before"
> query and
> 1094820 in the "after" query.  In the latter, the
> planner thinks
> it has to fetch so much of the table that a
> sequential scan would
> be faster than an index scan.  You can see whether
> that guess is
> correct by disabling enable_seqscan to force an
> index scan.  It
> might be useful to see the output of the following:
>
> SET enable_seqscan TO on;
> SET enable_indexscan TO off;
> EXPLAIN ANALYZE SELECT ...;
>
> SET enable_seqscan TO off;
> SET enable_indexscan TO on;
> EXPLAIN ANALYZE SELECT ...;
>
> You might also experiment with planner variables
> like effective_cache_size
> and random_page_cost to see how changing them
> affects the query
> plan.  However, be careful of tuning the system
> based on one query:
> make sure adjustments result in reasonable plans for
> many different
> queries.
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>




____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: Weird performance drop after VACUUM

From
Michael Fuhr
Date:
On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali wrote:
> "GroupAggregate  (cost=195623.66..206672.52 rows=20132
> width=16) (actual time=8205.283..10139.369 rows=55291
> loops=1)"
> "  ->  Sort  (cost=195623.66..198360.71 rows=1094820
> width=16) (actual time=8205.114..9029.501 rows=863883
> loops=1)"
> "        Sort Key: keyword_id"
> "        ->  Seq Scan on keyword_conversion_table c
> (cost=0.00..29990.83 rows=1094820 width=16) (actual
> time=0.057..1422.319 rows=863883 loops=1)"
> "              Filter: ((conversion_date >=
> '2005-06-07'::date) AND (conversion_date <=
> '2005-08-17'::date))"
> "Total runtime: 14683.617 ms"

What are your effective_cache_size and work_mem (8.x) or sort_mem (7.x)
settings?  How much RAM does the machine have?  If you have enough
memory then raising those variables should result in better plans;
you might also want to experiment with random_page_cost.  Be careful
not to set work_mem/sort_mem too high, though.  See "Run-time
Configuration" in the "Server Run-time Environment" chapter of the
documentation for more information about these variables.

--
Michael Fuhr

Re: Weird performance drop after VACUUM

From
Umit Oztosun
Date:
On Cum, 2005-08-26 at 19:31 -0400, Tom Lane wrote:
> I think the problem is that the planner is underestimating the cost of
> evaluating this complicated filter condition:
>
> >                 ->  Seq Scan on scf_irsaliye irs  (cost=0.00..30.00 rows=1 width=20) (actual time=0.290..0.290
rows=0loops=10000) 
> >                       Filter: (((karsifirma)::text = 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR
((turu)::text= 'KGI'::text) OR ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text =
'KC'::text)OR ((turu)::text = 'KCO'::text)) AND (((_key_sis_depo_dest)::text = '$$$$0000003l$1$$'::text) OR
((_key_sis_depo_dest)::text= '$$$$00000048$1$$'::text) OR ((_key_sis_depo_dest)::text = '$$$$0000004b$1$$'::text) OR
((_key_sis_depo_dest)::text= '$$$$0000004d$1$$'::text)) AND (tarih <= '2005-08-26'::date)) 
>
> While you could attack that by raising the cpu_operator_cost parameter,
> it would also be worth inquiring *why* the condition is so expensive to
> evaluate.  I am suspicious that you are running the database in a locale
> in which strcoll() is really slow.  Can you run it in C locale instead,
> or do you really need locale-aware behavior?  Can you switch to a
> different database encoding?  (A single-byte encoding such as Latin1
> might be faster than UTF8, for example.)

Yes, you are perfectly right. We are using UTF8 and tr_TR.UTF8 locale.
However, I tried the same tests with latin1 and C locale, it is surely
faster, but not dramatically. i.e.:

                      Before Vacuum    After Vacuum
UTF8 and tr_TR.UTF8:       ~8 s            ~110 s
latin1 and C:              ~7 s             ~65 s

I also played with cpu_operator_cost parameter and it dramatically
reduced query times, but not to the level before vacuum:

                      Before Vacuum    After Vacuum
UTF8 and tr_TR.UTF8:       ~8 s            ~11 s
latin1 and C:              ~7 s             ~9 s

These values are much better but I really wonder if I can reach the
performance levels before vacuum. I am also worried about the
side-effects that may be caused by the non-default cpu_operator_cost
parameter.

> Another possibility is to take a hard look at whether you can't simplify
> the filter condition, but that'd require more knowledge of your
> application than I have.

Yes that is another option, we are even considering schema changes to
use less character types, but these are really costly and error-prone
operations at the moment.

> Or you could just play with the order of the filter conditions ... for
> example, the date condition at the end is probably far cheaper to test
> than the text comparisons, so if that's fairly selective it'd be worth
> putting it first.

We are experimenting on this.

Thanks your help!

Best Regards,
Umit Oztosun


Re: Weird performance drop after VACUUM

From
"Steinar H. Gunderson"
Date:
On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote:
> Or you could just play with the order of the filter conditions ... for
> example, the date condition at the end is probably far cheaper to test
> than the text comparisons, so if that's fairly selective it'd be worth
> putting it first.

That's an interesting approach -- could the planner do such things itself?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Weird performance drop after VACUUM

From
"Steinar H. Gunderson"
Date:
On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote:
> It could, but it doesn't really have enough information.  We don't
> currently have any model that some operators are more expensive than
> others.  IIRC the only sort of reordering the current code will do
> in a filter condition list is to push clauses involving sub-SELECTs
> to the end.

I was more thinking along the lines of reordering "a AND/OR b" to "b AND/OR
a" if b has lower selectivity than a.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Weird performance drop after VACUUM

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote:
>> It could, but it doesn't really have enough information.  We don't
>> currently have any model that some operators are more expensive than
>> others.  IIRC the only sort of reordering the current code will do
>> in a filter condition list is to push clauses involving sub-SELECTs
>> to the end.

> I was more thinking along the lines of reordering "a AND/OR b" to "b AND/OR
> a" if b has lower selectivity than a.

Yeah, but if b is considerably more expensive to evaluate than a, that
could still be a net loss.  To do it correctly you really need to trade
off cost of evaluation against selectivity, and the planner currently
only knows something about the latter (and all too often, not enough :-().

I'd like to do this someday, but until we get some cost info in there
I think it'd be a mistake to do much re-ordering of conditions.
Currently the SQL programmer can determine what happens by writing his
query carefully --- if we reorder based on selectivity only, we could
make things worse, and there'd be no way to override it.

            regards, tom lane

Re: Weird performance drop after VACUUM

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote:
>> Or you could just play with the order of the filter conditions ... for
>> example, the date condition at the end is probably far cheaper to test
>> than the text comparisons, so if that's fairly selective it'd be worth
>> putting it first.

> That's an interesting approach -- could the planner do such things itself?

It could, but it doesn't really have enough information.  We don't
currently have any model that some operators are more expensive than
others.  IIRC the only sort of reordering the current code will do
in a filter condition list is to push clauses involving sub-SELECTs
to the end.

            regards, tom lane

Re: Weird performance drop after VACUUM

From
asif ali
Date:
Michael
The database is on the same system.
What I am doing is only "VACUUM analyze
conversion_table"

I did the the same thing on a newly created database.
And got the same result. So after "VACUUM analyze"
performance dropped.
Please see this. Runtime changes from "7755.115" to
"14859.291" ms


explain analyze
select keyword_id,sum(daily_impressions) as
daily_impressions ,
     sum(daily_clicks) as daily_clicks,
COALESCE(sum(daily_cpc::double precision),0) as
daily_cpc, sum(daily_revenues)as daily_revenues,
sum(daily_actions)as daily_actions
     ,count(daily_cpc) as count from  conversion_table c
where    c.conversion_date BETWEEN '2005-06-07' and
'2005-08-17'
    group by keyword_Id

"HashAggregate  (cost=18686.51..18686.54 rows=2
width=52) (actual time=7585.827..7720.370 rows=55717
loops=1)"
"  ->  Index Scan using conversion_table_pk on
conversion_table c  (cost=0.00..18599.25 rows=4986
width=52) (actual time=0.129..2882.066 rows=885493
loops=1)"
"        Index Cond: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 7755.115 ms"


VACUUM analyze  conversion_table


explain analyze

select keyword_id,sum(daily_impressions) as
daily_impressions ,
     sum(daily_clicks) as daily_clicks,
COALESCE(sum(daily_cpc::double precision),0) as
daily_cpc, sum(daily_revenues)as daily_revenues,
sum(daily_actions)as daily_actions
     ,count(daily_cpc) as count from  conversion_table c
where    c.conversion_date BETWEEN '2005-06-07' and
'2005-08-17'
    group by keyword_Id


"GroupAggregate  (cost=182521.76..200287.99 rows=20093
width=37) (actual time=8475.580..12618.793 rows=55717
loops=1)"
"  ->  Sort  (cost=182521.76..184698.58 rows=870730
width=37) (actual time=8475.246..9418.068 rows=885493
loops=1)"
"        Sort Key: keyword_id"
"        ->  Seq Scan on conversion_table c
(cost=0.00..27336.12 rows=870730 width=37) (actual
time=0.007..1520.788 rows=885493 loops=1)"
"              Filter: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 14859.291 ms"









--- Michael Fuhr <mike@fuhr.org> wrote:

> On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali
> wrote:
> > "GroupAggregate  (cost=195623.66..206672.52
> rows=20132
> > width=16) (actual time=8205.283..10139.369
> rows=55291
> > loops=1)"
> > "  ->  Sort  (cost=195623.66..198360.71
> rows=1094820
> > width=16) (actual time=8205.114..9029.501
> rows=863883
> > loops=1)"
> > "        Sort Key: keyword_id"
> > "        ->  Seq Scan on keyword_conversion_table
> c
> > (cost=0.00..29990.83 rows=1094820 width=16)
> (actual
> > time=0.057..1422.319 rows=863883 loops=1)"
> > "              Filter: ((conversion_date >=
> > '2005-06-07'::date) AND (conversion_date <=
> > '2005-08-17'::date))"
> > "Total runtime: 14683.617 ms"
>
> What are your effective_cache_size and work_mem
> (8.x) or sort_mem (7.x)
> settings?  How much RAM does the machine have?  If
> you have enough
> memory then raising those variables should result in
> better plans;
> you might also want to experiment with
> random_page_cost.  Be careful
> not to set work_mem/sort_mem too high, though.  See
> "Run-time
> Configuration" in the "Server Run-time Environment"
> chapter of the
> documentation for more information about these
> variables.
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>




____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: Weird performance drop after VACUUM

From
Michael Fuhr
Date:
On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali wrote:
> The database is on the same system.
> What I am doing is only "VACUUM analyze
> conversion_table"
>
> I did the the same thing on a newly created database.
> And got the same result. So after "VACUUM analyze"
> performance dropped.
> Please see this. Runtime changes from "7755.115" to
> "14859.291" ms

As has been pointed out a couple of times, you're getting a different
plan after VACUUM ANALYZE because the row count estimates are more
accurate.  Unfortunately the more accurate estimates result in a
query plan that's slower than the plan for the less accurate
estimates.  PostgreSQL *thinks* the plan will be faster but your
results show that it isn't, so you might need to adjust some of the
planner's cost constants.

A asked some questions that you didn't answer, so I'll ask them again:

What's your effective_cache_size setting?
What's your work_mem (8.x) or sort_mem (7.x) setting?
What's your random_page_cost setting?
How much available RAM does the machine have?
What version of PostgreSQL are you running?

Various tuning guides give advice on how to set the above and other
configuration variables.  Here's one such guide:

http://www.powerpostgresql.com/PerfList/

--
Michael Fuhr

Re: Weird performance drop after VACUUM

From
asif ali
Date:
Michael,
The
effective_cache_size, random_page_cost, work_mem
were set to default. (commented).
I have changed the setting of these and now the
performance is better see below.

"HashAggregate  (cost=42573.89..42925.52 rows=20093
width=37) (actual time=5273.984..5430.733 rows=55717
loops=1)"
"  ->  Seq Scan on keyword_conversion_table c
(cost=0.00..27336.12 rows=870730 width=37) (actual
time=0.052..1405.576 rows=885493 loops=1)"
"        Filter: ((conversion_date >=
'2005-06-07'::date) AND (conversion_date <=
'2005-08-17'::date))"
"Total runtime: 5463.764 ms"



Thanks a lot



--- Michael Fuhr <mike@fuhr.org> wrote:

> On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali
> wrote:
> > The database is on the same system.
> > What I am doing is only "VACUUM analyze
> > conversion_table"
> >
> > I did the the same thing on a newly created
> database.
> > And got the same result. So after "VACUUM analyze"
> > performance dropped.
> > Please see this. Runtime changes from "7755.115"
> to
> > "14859.291" ms
>
> As has been pointed out a couple of times, you're
> getting a different
> plan after VACUUM ANALYZE because the row count
> estimates are more
> accurate.  Unfortunately the more accurate estimates
> result in a
> query plan that's slower than the plan for the less
> accurate
> estimates.  PostgreSQL *thinks* the plan will be
> faster but your
> results show that it isn't, so you might need to
> adjust some of the
> planner's cost constants.
>
> A asked some questions that you didn't answer, so
> I'll ask them again:
>
> What's your effective_cache_size setting?
> What's your work_mem (8.x) or sort_mem (7.x)
> setting?
> What's your random_page_cost setting?
> How much available RAM does the machine have?
> What version of PostgreSQL are you running?
>
> Various tuning guides give advice on how to set the
> above and other
> configuration variables.  Here's one such guide:
>
> http://www.powerpostgresql.com/PerfList/
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>




____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs