Thread: Weird performance drop after VACUUM
Hello, We are using PostgreSQL for our business application. Recently, during testing of our application with large volumes of data, we faced a weird problem. Our query performance dropped *dramatically* after "VACUUM FULL ANALYZE" command. We have encountered a similar problem listed on mailing list archives, but the submitter solved his problem by rewriting his query, which is unfortunatelly very hard for us. I am attaching two EXPLAIN ANALYZE outputs, first one is just before the VACUUM FULL ANALYZE command and the other is the one after. Also attached is the SQL query, which is simplified to clearify the problem. In the example query time increases from 1.8 second to > 4.0 secons. The difference for the complete query is much bigger, query time increases from 7.8 seconds to > 110 seconds. Any help is appreciated, we were unable to identify what causes the query planner to choose a different/poor performing plan. Notes: Our production platform is Ubuntu Linux Hoary on i386, PostgreSQL 8.0.3, compiled from sources. Same tests were carried on Windows XP Professional and PostgreSQL 8.0.1 with similar results. The queries use little IO, high CPU. The largest table involved in the sample query has about 10000 rows. Indexes are used intensively, some tables use > 4 indexes. Best regards, Umit Oztosun
Attachment
Hi, I have the same issue. After doing "VACCUME ANALYZE" performance of the query dropped. Here is the query explain select * from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' Before "VACCUME ANALYZE" "Index Scan using conversion_table_pk on keyword_conversion_table c (cost=0.00..18599.25 rows=4986 width=95)" " Index Cond: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" After "VACCUME ANALYZE" "Seq Scan on conversion_table c (cost=0.00..29990.83 rows=1094820 width=66)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" I dont know why system is doing "Seq scan" now. Thanks asif ali --- �mit �ztosun <umit@likyabilisim.com> wrote: > Hello, > > We are using PostgreSQL for our business > application. Recently, during > testing of our application with large volumes of > data, we faced a weird > problem. Our query performance dropped > *dramatically* after "VACUUM FULL > ANALYZE" command. We have encountered a similar > problem listed on > mailing list archives, but the submitter solved his > problem by rewriting > his query, which is unfortunatelly very hard for us. > > I am attaching two EXPLAIN ANALYZE outputs, first > one is just before the > VACUUM FULL ANALYZE command and the other is the one > after. Also > attached is the SQL query, which is simplified to > clearify the problem. > In the example query time increases from 1.8 second > to > 4.0 secons. The > difference for the complete query is much bigger, > query time increases > from 7.8 seconds to > 110 seconds. > > Any help is appreciated, we were unable to identify > what causes the > query planner to choose a different/poor performing > plan. > > Notes: > Our production platform is Ubuntu Linux Hoary on > i386, PostgreSQL 8.0.3, > compiled from sources. Same tests were carried on > Windows XP > Professional and PostgreSQL 8.0.1 with similar > results. The queries use > little IO, high CPU. The largest table involved in > the sample query has > about 10000 rows. Indexes are used intensively, some > tables use > 4 > indexes. > > Best regards, > Umit Oztosun > > > SELECT * FROM ( > SELECT > COALESCE ( > (SELECT COALESCE (sum(irskal.anamiktar), > 0) > * (SELECT > birim.fiyat2 * (SELECT kur1 > FROM > sis_doviz_kuru kur > WHERE > birim._key_sis_doviz2 = kur._key_sis_doviz > ORDER BY tarih > desc > LIMIT 1) > FROM scf_stokkart_birimleri > birim > WHERE _key_scf_stokkart = > stok._key > AND anabirim = '1' > ) > FROM scf_irsaliye irs, > scf_irsaliye_kalemi irskal > WHERE irskal._key_kalemturu = > stok._key > AND irskal._key_scf_irsaliye = > irs._key > AND irs.karsifirma = 'KENDI' > AND (irs.turu='MAI' OR > irs.turu='KGI' OR irs.turu='PS' OR irs.turu='TS' OR > irs.turu='KC' OR irs.turu='KCO') > AND ( irs._key_sis_depo_dest = > '$$$$0000003l$1$$' OR irs._key_sis_depo_dest = > '$$$$00000048$1$$' OR irs._key_sis_depo_dest = > '$$$$0000004b$1$$' OR irs._key_sis_depo_dest = > '$$$$0000004d$1$$' ) > AND ((irskal._key LIKE '0000%' OR > irskal._key LIKE '0101%' OR irskal._key LIKE '$$%')) > AND irs.tarih <= '2005-08-26' > ), 0 > ) as arti_fiili_irs_karsifirma, > stok.* > FROM scf_stokkart stok > ) AS _SWT WHERE (_key LIKE '00%' OR _key LIKE '01%' > OR _key LIKE '$$%') ORDER BY _key desc > > Before VACUUM FULL ANALYZE - Short Query > --------------------------------------- > Sort (cost=9094.31..9094.40 rows=37 width=817) > (actual time=1852.799..1877.738 rows=10000 loops=1) > Sort Key: stok._key > -> Seq Scan on scf_stokkart stok > (cost=0.00..9093.34 rows=37 width=817) (actual > time=8.670..1575.586 rows=10000 loops=1) > Filter: (((_key)::text ~~ '00%'::text) OR > ((_key)::text ~~ '01%'::text) OR ((_key)::text ~~ > '$$%'::text)) > SubPlan > -> Aggregate (cost=237.29..237.29 rows=1 > width=16) (actual time=0.136..0.138 rows=1 > loops=10000) > InitPlan > -> Index Scan using > scf_stokkart_birimleri_key_scf_stokkart_idx on > scf_stokkart_birimleri birim (cost=0.00..209.59 > rows=1 width=58) (actual time=0.088..0.093 rows=1 > loops=10000) > Index Cond: > ((_key_scf_stokkart)::text = ($1)::text) > Filter: (anabirim = > '1'::bpchar) > SubPlan > -> Limit > (cost=9.31..9.31 rows=1 width=17) (actual > time=0.046..0.048 rows=1 loops=10000) > -> Sort > (cost=9.31..9.31 rows=2 width=17) (actual > time=0.041..0.041 rows=1 loops=10000) > Sort Key: > tarih > -> Index Scan > using sis_doviz_kuru_key_sis_doviz_idx on > sis_doviz_kuru kur (cost=0.00..9.30 rows=2 > width=17) (actual time=0.018..0.029 rows=2 > loops=10000) > Index > Cond: (($0)::text = (_key_sis_doviz)::text) > -> Nested Loop (cost=0.00..27.69 > rows=1 width=16) (actual time=0.033..0.033 rows=0 > loops=10000) > -> Index Scan using > scf_irsaliye_kalemi_key_kalemturu_idx on > scf_irsaliye_kalemi irskal (cost=0.00..21.75 rows=1 > width=58) (actual time=0.017..0.020 rows=0 > loops=10000) > Index Cond: > ((_key_kalemturu)::text = ($1)::text) > Filter: (((_key)::text > ~~ '0000%'::text) OR ((_key)::text ~~ '0101%'::text) > OR ((_key)::text ~~ '$$%'::text)) > -> Index Scan using > scf_irsaliye_pkey on scf_irsaliye irs > (cost=0.00..5.94 rows=1 width=42) (actual > time=0.021..0.021 rows=0 loops=3000) > Index Cond: > (("outer"._key_scf_irsaliye)::text = > (irs._key)::text) > Filter: > (((karsifirma)::text = 'KENDI'::text) AND > (((turu)::text = 'MAI'::text) OR ((turu)::text = > 'KGI'::text) OR ((turu)::text = 'PS'::text) OR > ((turu)::text = 'TS'::text) OR ((turu)::text = > 'KC'::text) OR ((turu)::text = 'KCO'::text)) AND > (((_key_sis_depo_dest)::text = > '$$$$0000003l$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$00000048$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$0000004b$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$0000004d$1$$'::text)) AND (tarih <= > '2005-08-26'::date)) > Total runtime: 1899.533 ms > > After VACUUM FULL ANALYZE - Short Query > --------------------------------------- > Index Scan Backward using scf_stokkart_pkey on > scf_stokkart stok (cost=0.00..392045.63 rows=9998 > width=166) (actual time=0.661..4431.568 rows=10000 > loops=1) > Filter: (((_key)::text ~~ '00%'::text) OR > ((_key)::text ~~ '01%'::text) OR ((_key)::text ~~ > '$$%'::text)) > SubPlan > -> Aggregate (cost=39.16..39.16 rows=1 > width=10) (actual time=0.416..0.418 rows=1 > loops=10000) > InitPlan > -> Index Scan using > scf_stokkart_birimleri_key_scf_stokkart_idx on > scf_stokkart_birimleri birim (cost=0.00..5.25 > rows=2 width=28) (actual time=0.101..0.105 rows=1 > loops=10000) > Index Cond: > ((_key_scf_stokkart)::text = ($1)::text) > Filter: (anabirim = '1'::bpchar) > SubPlan > -> Limit (cost=1.08..1.09 > rows=1 width=15) (actual time=0.048..0.050 rows=1 > loops=10000) > -> Sort (cost=1.08..1.09 > rows=2 width=15) (actual time=0.043..0.043 rows=1 > loops=10000) > Sort Key: tarih > -> Seq Scan on > sis_doviz_kuru kur (cost=0.00..1.07 rows=2 > width=15) (actual time=0.009..0.026 rows=2 > loops=10000) > Filter: > (($0)::text = (_key_sis_doviz)::text) > -> Nested Loop (cost=0.00..33.90 rows=1 > width=10) (actual time=0.295..0.295 rows=0 > loops=10000) > -> Seq Scan on scf_irsaliye irs > (cost=0.00..30.00 rows=1 width=20) (actual > time=0.290..0.290 rows=0 loops=10000) > Filter: (((karsifirma)::text = > 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR > ((turu)::text = 'KGI'::text) OR ((turu)::text = > 'PS'::text) OR ((turu)::text = 'TS'::text) OR > ((turu)::text = 'KC'::text) OR ((turu)::text = > 'KCO'::text)) AND (((_key_sis_depo_dest)::text = > '$$$$0000003l$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$00000048$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$0000004b$1$$'::text) OR > ((_key_sis_depo_dest)::text = > '$$$$0000004d$1$$'::text)) AND (tarih <= > '2005-08-26'::date)) > -> Index Scan using > scf_irsaliye_kalemi_key_scf_irsaliye_idx on > scf_irsaliye_kalemi irskal (cost=0.00..3.89 rows=1 > width=30) (never executed) > Index Cond: > ((irskal._key_scf_irsaliye)::text = > ("outer"._key)::text) > Filter: > (((_key_kalemturu)::text = ($1)::text) AND > (((_key)::text ~~ '0000%'::text) OR ((_key)::text ~~ > '0101%'::text) OR ((_key)::text ~~ '$$%'::text))) > Total runtime: 4456.895 ms > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please > send an appropriate > subscribe-nomail command to > majordomo@postgresql.org so that your > message can get through to the mailing list > cleanly > __________________________________ Yahoo! Mail for Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
> Hi, > I have the same issue. After doing "VACCUME ANALYZE" > performance of the query dropped. > > Here is the query > explain select * from conversion_table c where > c.conversion_date BETWEEN '2005-06-07' and > '2005-08-17' > > Before "VACCUME ANALYZE" > > "Index Scan using conversion_table_pk on > keyword_conversion_table c (cost=0.00..18599.25 > rows=4986 width=95)" > " Index Cond: ((conversion_date >= > '2005-06-07'::date) AND (conversion_date <= > '2005-08-17'::date))" > > > After "VACCUME ANALYZE" > > > "Seq Scan on conversion_table c (cost=0.00..29990.83 > rows=1094820 width=66)" > " Filter: ((conversion_date >= '2005-06-07'::date) > AND (conversion_date <= '2005-08-17'::date))" > > > I dont know why system is doing "Seq scan" now. I could be wrong as I'm definitely no expert on reading the output of EXPLAIN, but it seems to say that prior to VACUUM it was expecting to retrieve 4986 rows and afterwards expecting to retrieve 1094820 rows. Which is a pretty big difference. So maybe the statistics were just really really off prior to vacuuming and once it did vacuum it realized there would be a lot more matches and since there were a lot more matches the planner decided to do a seq scan since it would be quicker overall... Maybe? Seems I've heard Tom Lane say something to that affect, although much more eloquently :-) -philip
On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali wrote: > I have the same issue. After doing "VACCUME ANALYZE" > performance of the query dropped. Your EXPLAIN output doesn't show the actual query times -- could you post the EXPLAIN ANALYZE output? That'll also show how accurate the planner's row count estimates are. > Before "VACCUME ANALYZE" > > "Index Scan using conversion_table_pk on > keyword_conversion_table c (cost=0.00..18599.25 > rows=4986 width=95)" > " Index Cond: ((conversion_date >= > '2005-06-07'::date) AND (conversion_date <= > '2005-08-17'::date))" > > After "VACCUME ANALYZE" > > "Seq Scan on conversion_table c (cost=0.00..29990.83 > rows=1094820 width=66)" > " Filter: ((conversion_date >= '2005-06-07'::date) > AND (conversion_date <= '2005-08-17'::date))" > > I dont know why system is doing "Seq scan" now. Notice the row count estimates: 4986 in the "before" query and 1094820 in the "after" query. In the latter, the planner thinks it has to fetch so much of the table that a sequential scan would be faster than an index scan. You can see whether that guess is correct by disabling enable_seqscan to force an index scan. It might be useful to see the output of the following: SET enable_seqscan TO on; SET enable_indexscan TO off; EXPLAIN ANALYZE SELECT ...; SET enable_seqscan TO off; SET enable_indexscan TO on; EXPLAIN ANALYZE SELECT ...; You might also experiment with planner variables like effective_cache_size and random_page_cost to see how changing them affects the query plan. However, be careful of tuning the system based on one query: make sure adjustments result in reasonable plans for many different queries. -- Michael Fuhr
=?ISO-8859-1?Q?=DCmit_=D6ztosun?= <umit@likyabilisim.com> writes: > We are using PostgreSQL for our business application. Recently, during > testing of our application with large volumes of data, we faced a weird > problem. Our query performance dropped *dramatically* after "VACUUM FULL > ANALYZE" command. I think the problem is that the planner is underestimating the cost of evaluating this complicated filter condition: > -> Seq Scan on scf_irsaliye irs (cost=0.00..30.00 rows=1 width=20) (actual time=0.290..0.290 rows=0 loops=10000) > Filter: (((karsifirma)::text = 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR ((turu)::text= 'KGI'::text) OR ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text = 'KC'::text)OR ((turu)::text = 'KCO'::text)) AND (((_key_sis_depo_dest)::text = '$$$$0000003l$1$$'::text) OR ((_key_sis_depo_dest)::text= '$$$$00000048$1$$'::text) OR ((_key_sis_depo_dest)::text = '$$$$0000004b$1$$'::text) OR ((_key_sis_depo_dest)::text= '$$$$0000004d$1$$'::text)) AND (tarih <= '2005-08-26'::date)) While you could attack that by raising the cpu_operator_cost parameter, it would also be worth inquiring *why* the condition is so expensive to evaluate. I am suspicious that you are running the database in a locale in which strcoll() is really slow. Can you run it in C locale instead, or do you really need locale-aware behavior? Can you switch to a different database encoding? (A single-byte encoding such as Latin1 might be faster than UTF8, for example.) Another possibility is to take a hard look at whether you can't simplify the filter condition, but that'd require more knowledge of your application than I have. Or you could just play with the order of the filter conditions ... for example, the date condition at the end is probably far cheaper to test than the text comparisons, so if that's fairly selective it'd be worth putting it first. regards, tom lane
Thanks Michael For your reply. Here is performance on the database on which i did VACUUM ANALYZE explain analyze select keyword_id ,sum(daily_impressions) as daily_impressions ,sum(daily_actions)as daily_actions from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' group by keyword_Id "GroupAggregate (cost=195623.66..206672.52 rows=20132 width=16) (actual time=8205.283..10139.369 rows=55291 loops=1)" " -> Sort (cost=195623.66..198360.71 rows=1094820 width=16) (actual time=8205.114..9029.501 rows=863883 loops=1)" " Sort Key: keyword_id" " -> Seq Scan on keyword_conversion_table c (cost=0.00..29990.83 rows=1094820 width=16) (actual time=0.057..1422.319 rows=863883 loops=1)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 14683.617 ms" Now see if am changing the query and commenting one column. explain analyze select keyword_id ,sum(daily_impressions) as daily_impressions -- ,sum(daily_actions)as daily_actions from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' group by keyword_Id "HashAggregate (cost=27373.51..27373.52 rows=2 width=16) (actual time=3030.386..3127.073 rows=55717 loops=1)" " -> Seq Scan on conversion_table c (cost=0.00..27336.12 rows=4986 width=16) (actual time=0.050..1357.164 rows=885493 loops=1)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 3159.162 ms" I noticed "GroupAggregate" changes to "HashAggregate" and performance from 14 sec to 3 sec. On the other hand I have another database which I did not do "VACUUM ANALYZE" working fine. explain analyze select keyword_id ,sum(daily_impressions) as daily_impressions ,sum(daily_actions)as daily_actions from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' group by keyword_Id "HashAggregate (cost=27373.51..27373.52 rows=2 width=16) (actual time=3024.289..3120.324 rows=55717 loops=1)" " -> Seq Scan on conversion_table c (cost=0.00..27336.12 rows=4986 width=16) (actual time=0.047..1352.212 rows=885493 loops=1)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 3152.437 ms" I am new to postgres. Thanks in advance. asif ali --- Michael Fuhr <mike@fuhr.org> wrote: > On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali > wrote: > > I have the same issue. After doing "VACCUME > ANALYZE" > > performance of the query dropped. > > Your EXPLAIN output doesn't show the actual query > times -- could > you post the EXPLAIN ANALYZE output? That'll also > show how accurate > the planner's row count estimates are. > > > Before "VACCUME ANALYZE" > > > > "Index Scan using conversion_table_pk on > > keyword_conversion_table c (cost=0.00..18599.25 > > rows=4986 width=95)" > > " Index Cond: ((conversion_date >= > > '2005-06-07'::date) AND (conversion_date <= > > '2005-08-17'::date))" > > > > After "VACCUME ANALYZE" > > > > "Seq Scan on conversion_table c > (cost=0.00..29990.83 > > rows=1094820 width=66)" > > " Filter: ((conversion_date >= > '2005-06-07'::date) > > AND (conversion_date <= '2005-08-17'::date))" > > > > I dont know why system is doing "Seq scan" now. > > Notice the row count estimates: 4986 in the "before" > query and > 1094820 in the "after" query. In the latter, the > planner thinks > it has to fetch so much of the table that a > sequential scan would > be faster than an index scan. You can see whether > that guess is > correct by disabling enable_seqscan to force an > index scan. It > might be useful to see the output of the following: > > SET enable_seqscan TO on; > SET enable_indexscan TO off; > EXPLAIN ANALYZE SELECT ...; > > SET enable_seqscan TO off; > SET enable_indexscan TO on; > EXPLAIN ANALYZE SELECT ...; > > You might also experiment with planner variables > like effective_cache_size > and random_page_cost to see how changing them > affects the query > plan. However, be careful of tuning the system > based on one query: > make sure adjustments result in reasonable plans for > many different > queries. > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali wrote: > "GroupAggregate (cost=195623.66..206672.52 rows=20132 > width=16) (actual time=8205.283..10139.369 rows=55291 > loops=1)" > " -> Sort (cost=195623.66..198360.71 rows=1094820 > width=16) (actual time=8205.114..9029.501 rows=863883 > loops=1)" > " Sort Key: keyword_id" > " -> Seq Scan on keyword_conversion_table c > (cost=0.00..29990.83 rows=1094820 width=16) (actual > time=0.057..1422.319 rows=863883 loops=1)" > " Filter: ((conversion_date >= > '2005-06-07'::date) AND (conversion_date <= > '2005-08-17'::date))" > "Total runtime: 14683.617 ms" What are your effective_cache_size and work_mem (8.x) or sort_mem (7.x) settings? How much RAM does the machine have? If you have enough memory then raising those variables should result in better plans; you might also want to experiment with random_page_cost. Be careful not to set work_mem/sort_mem too high, though. See "Run-time Configuration" in the "Server Run-time Environment" chapter of the documentation for more information about these variables. -- Michael Fuhr
On Cum, 2005-08-26 at 19:31 -0400, Tom Lane wrote: > I think the problem is that the planner is underestimating the cost of > evaluating this complicated filter condition: > > > -> Seq Scan on scf_irsaliye irs (cost=0.00..30.00 rows=1 width=20) (actual time=0.290..0.290 rows=0loops=10000) > > Filter: (((karsifirma)::text = 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR ((turu)::text= 'KGI'::text) OR ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text = 'KC'::text)OR ((turu)::text = 'KCO'::text)) AND (((_key_sis_depo_dest)::text = '$$$$0000003l$1$$'::text) OR ((_key_sis_depo_dest)::text= '$$$$00000048$1$$'::text) OR ((_key_sis_depo_dest)::text = '$$$$0000004b$1$$'::text) OR ((_key_sis_depo_dest)::text= '$$$$0000004d$1$$'::text)) AND (tarih <= '2005-08-26'::date)) > > While you could attack that by raising the cpu_operator_cost parameter, > it would also be worth inquiring *why* the condition is so expensive to > evaluate. I am suspicious that you are running the database in a locale > in which strcoll() is really slow. Can you run it in C locale instead, > or do you really need locale-aware behavior? Can you switch to a > different database encoding? (A single-byte encoding such as Latin1 > might be faster than UTF8, for example.) Yes, you are perfectly right. We are using UTF8 and tr_TR.UTF8 locale. However, I tried the same tests with latin1 and C locale, it is surely faster, but not dramatically. i.e.: Before Vacuum After Vacuum UTF8 and tr_TR.UTF8: ~8 s ~110 s latin1 and C: ~7 s ~65 s I also played with cpu_operator_cost parameter and it dramatically reduced query times, but not to the level before vacuum: Before Vacuum After Vacuum UTF8 and tr_TR.UTF8: ~8 s ~11 s latin1 and C: ~7 s ~9 s These values are much better but I really wonder if I can reach the performance levels before vacuum. I am also worried about the side-effects that may be caused by the non-default cpu_operator_cost parameter. > Another possibility is to take a hard look at whether you can't simplify > the filter condition, but that'd require more knowledge of your > application than I have. Yes that is another option, we are even considering schema changes to use less character types, but these are really costly and error-prone operations at the moment. > Or you could just play with the order of the filter conditions ... for > example, the date condition at the end is probably far cheaper to test > than the text comparisons, so if that's fairly selective it'd be worth > putting it first. We are experimenting on this. Thanks your help! Best Regards, Umit Oztosun
On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote: > Or you could just play with the order of the filter conditions ... for > example, the date condition at the end is probably far cheaper to test > than the text comparisons, so if that's fairly selective it'd be worth > putting it first. That's an interesting approach -- could the planner do such things itself? /* Steinar */ -- Homepage: http://www.sesse.net/
On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote: > It could, but it doesn't really have enough information. We don't > currently have any model that some operators are more expensive than > others. IIRC the only sort of reordering the current code will do > in a filter condition list is to push clauses involving sub-SELECTs > to the end. I was more thinking along the lines of reordering "a AND/OR b" to "b AND/OR a" if b has lower selectivity than a. /* Steinar */ -- Homepage: http://www.sesse.net/
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote: >> It could, but it doesn't really have enough information. We don't >> currently have any model that some operators are more expensive than >> others. IIRC the only sort of reordering the current code will do >> in a filter condition list is to push clauses involving sub-SELECTs >> to the end. > I was more thinking along the lines of reordering "a AND/OR b" to "b AND/OR > a" if b has lower selectivity than a. Yeah, but if b is considerably more expensive to evaluate than a, that could still be a net loss. To do it correctly you really need to trade off cost of evaluation against selectivity, and the planner currently only knows something about the latter (and all too often, not enough :-(). I'd like to do this someday, but until we get some cost info in there I think it'd be a mistake to do much re-ordering of conditions. Currently the SQL programmer can determine what happens by writing his query carefully --- if we reorder based on selectivity only, we could make things worse, and there'd be no way to override it. regards, tom lane
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote: >> Or you could just play with the order of the filter conditions ... for >> example, the date condition at the end is probably far cheaper to test >> than the text comparisons, so if that's fairly selective it'd be worth >> putting it first. > That's an interesting approach -- could the planner do such things itself? It could, but it doesn't really have enough information. We don't currently have any model that some operators are more expensive than others. IIRC the only sort of reordering the current code will do in a filter condition list is to push clauses involving sub-SELECTs to the end. regards, tom lane
Michael The database is on the same system. What I am doing is only "VACUUM analyze conversion_table" I did the the same thing on a newly created database. And got the same result. So after "VACUUM analyze" performance dropped. Please see this. Runtime changes from "7755.115" to "14859.291" ms explain analyze select keyword_id,sum(daily_impressions) as daily_impressions , sum(daily_clicks) as daily_clicks, COALESCE(sum(daily_cpc::double precision),0) as daily_cpc, sum(daily_revenues)as daily_revenues, sum(daily_actions)as daily_actions ,count(daily_cpc) as count from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' group by keyword_Id "HashAggregate (cost=18686.51..18686.54 rows=2 width=52) (actual time=7585.827..7720.370 rows=55717 loops=1)" " -> Index Scan using conversion_table_pk on conversion_table c (cost=0.00..18599.25 rows=4986 width=52) (actual time=0.129..2882.066 rows=885493 loops=1)" " Index Cond: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 7755.115 ms" VACUUM analyze conversion_table explain analyze select keyword_id,sum(daily_impressions) as daily_impressions , sum(daily_clicks) as daily_clicks, COALESCE(sum(daily_cpc::double precision),0) as daily_cpc, sum(daily_revenues)as daily_revenues, sum(daily_actions)as daily_actions ,count(daily_cpc) as count from conversion_table c where c.conversion_date BETWEEN '2005-06-07' and '2005-08-17' group by keyword_Id "GroupAggregate (cost=182521.76..200287.99 rows=20093 width=37) (actual time=8475.580..12618.793 rows=55717 loops=1)" " -> Sort (cost=182521.76..184698.58 rows=870730 width=37) (actual time=8475.246..9418.068 rows=885493 loops=1)" " Sort Key: keyword_id" " -> Seq Scan on conversion_table c (cost=0.00..27336.12 rows=870730 width=37) (actual time=0.007..1520.788 rows=885493 loops=1)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 14859.291 ms" --- Michael Fuhr <mike@fuhr.org> wrote: > On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali > wrote: > > "GroupAggregate (cost=195623.66..206672.52 > rows=20132 > > width=16) (actual time=8205.283..10139.369 > rows=55291 > > loops=1)" > > " -> Sort (cost=195623.66..198360.71 > rows=1094820 > > width=16) (actual time=8205.114..9029.501 > rows=863883 > > loops=1)" > > " Sort Key: keyword_id" > > " -> Seq Scan on keyword_conversion_table > c > > (cost=0.00..29990.83 rows=1094820 width=16) > (actual > > time=0.057..1422.319 rows=863883 loops=1)" > > " Filter: ((conversion_date >= > > '2005-06-07'::date) AND (conversion_date <= > > '2005-08-17'::date))" > > "Total runtime: 14683.617 ms" > > What are your effective_cache_size and work_mem > (8.x) or sort_mem (7.x) > settings? How much RAM does the machine have? If > you have enough > memory then raising those variables should result in > better plans; > you might also want to experiment with > random_page_cost. Be careful > not to set work_mem/sort_mem too high, though. See > "Run-time > Configuration" in the "Server Run-time Environment" > chapter of the > documentation for more information about these > variables. > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali wrote: > The database is on the same system. > What I am doing is only "VACUUM analyze > conversion_table" > > I did the the same thing on a newly created database. > And got the same result. So after "VACUUM analyze" > performance dropped. > Please see this. Runtime changes from "7755.115" to > "14859.291" ms As has been pointed out a couple of times, you're getting a different plan after VACUUM ANALYZE because the row count estimates are more accurate. Unfortunately the more accurate estimates result in a query plan that's slower than the plan for the less accurate estimates. PostgreSQL *thinks* the plan will be faster but your results show that it isn't, so you might need to adjust some of the planner's cost constants. A asked some questions that you didn't answer, so I'll ask them again: What's your effective_cache_size setting? What's your work_mem (8.x) or sort_mem (7.x) setting? What's your random_page_cost setting? How much available RAM does the machine have? What version of PostgreSQL are you running? Various tuning guides give advice on how to set the above and other configuration variables. Here's one such guide: http://www.powerpostgresql.com/PerfList/ -- Michael Fuhr
Michael, The effective_cache_size, random_page_cost, work_mem were set to default. (commented). I have changed the setting of these and now the performance is better see below. "HashAggregate (cost=42573.89..42925.52 rows=20093 width=37) (actual time=5273.984..5430.733 rows=55717 loops=1)" " -> Seq Scan on keyword_conversion_table c (cost=0.00..27336.12 rows=870730 width=37) (actual time=0.052..1405.576 rows=885493 loops=1)" " Filter: ((conversion_date >= '2005-06-07'::date) AND (conversion_date <= '2005-08-17'::date))" "Total runtime: 5463.764 ms" Thanks a lot --- Michael Fuhr <mike@fuhr.org> wrote: > On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali > wrote: > > The database is on the same system. > > What I am doing is only "VACUUM analyze > > conversion_table" > > > > I did the the same thing on a newly created > database. > > And got the same result. So after "VACUUM analyze" > > performance dropped. > > Please see this. Runtime changes from "7755.115" > to > > "14859.291" ms > > As has been pointed out a couple of times, you're > getting a different > plan after VACUUM ANALYZE because the row count > estimates are more > accurate. Unfortunately the more accurate estimates > result in a > query plan that's slower than the plan for the less > accurate > estimates. PostgreSQL *thinks* the plan will be > faster but your > results show that it isn't, so you might need to > adjust some of the > planner's cost constants. > > A asked some questions that you didn't answer, so > I'll ask them again: > > What's your effective_cache_size setting? > What's your work_mem (8.x) or sort_mem (7.x) > setting? > What's your random_page_cost setting? > How much available RAM does the machine have? > What version of PostgreSQL are you running? > > Various tuning guides give advice on how to set the > above and other > configuration variables. Here's one such guide: > > http://www.powerpostgresql.com/PerfList/ > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs