Thread: Queries taking ages in PG 8.1, have been much faster in PG<=8.0
Hi! I've got an urgent problem with an application which is evaluating a monthly survey; it's running quite a lot of queries like this: select SOURCE.NAME as TYPE, count(PARTICIPANT.SESSION_ID) as TOTAL from ( select PARTICIPANT.SESSION_ID from survey.PARTICIPANT, survey.ANSWER where PARTICIPANT.STATUS = 1 and date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month', now()-'1 month'::interval) and PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 6 and ANSWER.VALUE = 1 ) as PARTICIPANT, survey.ANSWER, survey.HANDY_JAVA SOURCE where PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 16 and ANSWER.VALUE = SOURCE.ID group by SOURCE.NAME, SOURCE.POSITION order by SOURCE.POSITION asc; My current PostgreSQL-version is "PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2". Up to 8.0, a query like this took a couple of seconds, maybe even up to a minute. In 8.1 a query like this will run from 30 minutes up to two hours to complete, depending on ist complexity. I've got autovaccum enabled and run a nightly vacuum analyze over all of my databases. Here's some information about the relevant tables: Table answer has got ~ 8.9M rows (estimated 8,872,130, counted 8,876,648), participant has got ~178K rows (estimated 178,165, counted 178,248), HANDY_JAVA has got three rows. This is the explain-analyze-output for the above: "Sort (cost=11383.09..11383.10 rows=3 width=16) (actual time=1952676.858..1952676.863 rows=3 loops=1)" " Sort Key: source."position"" " -> HashAggregate (cost=11383.03..11383.07 rows=3 width=16) (actual time=1952676.626..1952676.635 rows=3 loops=1)" " -> Nested Loop (cost=189.32..11383.00 rows=5 width=16) (actual time=6975.812..1952371.782 rows=9806 loops=1)" " -> Nested Loop (cost=3.48..3517.47 rows=42 width=20) (actual time=6819.716..15419.930 rows=9806 loops=1)" " -> Nested Loop (cost=3.48..1042.38 rows=738 width=16) (actual time=258.434..6233.039 rows=162723 loops=1)" " -> Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=0.093..0.118 rows=3 loops=1)" " -> Bitmap Heap Scan on answer (cost=3.48..344.04 rows=246 width=8) (actual time=172.381..1820.499 rows=54241 loops=3)" " Recheck Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))" " -> Bitmap Index Scan on idx02_performance (cost=0.00..3.48 rows=246 width=0) (actual time=98.321..98.321 rows=54245 loops=3)" " Index Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))" " -> Index Scan using idx01_perf_0006 on participant (cost=0.00..3.34 rows=1 width=4) (actual time=0.049..0.050 rows=0 loops=162723)" " Index Cond: (participant.session_id = "outer".session_id)" " Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 mon'::interval))))" " -> Bitmap Heap Scan on answer (cost=185.85..187.26 rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806)" " Recheck Cond: (("outer".session_id = answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1))" " -> BitmapAnd (cost=185.85..185.85 rows=1 width=0) (actual time=197.421..197.421 rows=0 loops=9806)" " -> Bitmap Index Scan on idx_answer_session_id (cost=0.00..2.83 rows=236 width=0) (actual time=0.109..0.109 rows=49 loops=9806)" " Index Cond: ("outer".session_id = answer.session_id)" " -> Bitmap Index Scan on idx02_performance (cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742 rows=165697 loops=9806)" " Index Cond: ((question_id = 6) AND (value = 1))" "Total runtime: 1952678.393 ms" I am really sorry, but currently I haven't got any 8.0-installation left, so I cannot provide the explain (analyze) output for 8.0. I fiddled a little with the statement and managed to speed things up quite a lot: select SOURCE.NAME as TYPE, count(ANSWER.SESSION_ID) as TOTAL from survey.ANSWER, survey.HANDY_JAVA SOURCE where ANSWER.QUESTION_ID = 16 and ANSWER.VALUE = SOURCE.ID and ANSWER.SESSION_ID in ( select PARTICIPANT.SESSION_ID from survey.PARTICIPANT, survey.ANSWER where PARTICIPANT.STATUS = 1 and date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month', now()-'1 month'::interval) and PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 6 and ANSWER.VALUE = 1 ) group by SOURCE.NAME, SOURCE.POSITION order by SOURCE.POSITION asc; Here's the explain analyze output: "Sort (cost=27835.39..27835.39 rows=3 width=16) (actual time=9609.207..9609.212 rows=3 loops=1)" " Sort Key: source."position"" " -> HashAggregate (cost=27835.33..27835.36 rows=3 width=16) (actual time=9609.058..9609.067 rows=3 loops=1)" " -> Hash IN Join (cost=26645.78..27835.29 rows=5 width=16) (actual time=6374.436..9548.945 rows=9806 loops=1)" " Hash Cond: ("outer".session_id = "inner".session_id)" " -> Nested Loop (cost=3.48..1042.38 rows=738 width=16) (actual time=190.419..4817.977 rows=162704 loops=1)" " -> Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=0.036..0.058 rows=3 loops=1)" " -> Bitmap Heap Scan on answer (cost=3.48..344.04 rows=246 width=8) (actual time=116.719..1390.931 rows=54235 loops=3)" " Recheck Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))" " -> Bitmap Index Scan on idx02_performance (cost=0.00..3.48 rows=246 width=0) (actual time=63.195..63.195 rows=54235 loops=3)" " Index Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))" " -> Hash (cost=26639.37..26639.37 rows=1174 width=8) (actual time=3906.831..3906.831 rows=9806 loops=1)" " -> Hash Join (cost=4829.24..26639.37 rows=1174 width=8) (actual time=464.011..3877.539 rows=9806 loops=1)" " Hash Cond: ("outer".session_id = "inner".session_id)" " -> Bitmap Heap Scan on answer (cost=182.76..21413.93 rows=20626 width=4) (actual time=273.839..2860.984 rows=165655 loops=1)" " Recheck Cond: ((question_id = 6) AND (value = 1))" " -> Bitmap Index Scan on idx02_performance (cost=0.00..182.76 rows=20626 width=0) (actual time=171.933..171.933 rows=165659 loops=1)" " Index Cond: ((question_id = 6) AND (value = 1))" " -> Hash (cost=4621.13..4621.13 rows=10141 width=4) (actual time=123.351..123.351 rows=11134 loops=1)" " -> Index Scan using idx01_perf_0005 on participant (cost=0.01..4621.13 rows=10141 width=4) (actual time=0.545..93.200 rows=11134 loops=1)" " Index Cond: (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 mon'::interval)))" " Filter: (status = 1)" "Total runtime: 9612.249 ms" Regarding the total runtime, this is roughly in the same dimension as for the original query in 8.0, as far as I remember. I haven't written these queries myself in the first place, but they were done in the dark ages when IN was a no-no and haven't been touched ever since - there hadn't been any need to do so. My current problem is that rewriting hundreds of queries, some of them quite a bit more complex than this one, but all of them using the same general scheme, would take quite a lot of time - and I'm expected to hand over the survey results ASAP. So I will obviously have to do a rewrite if there's just no other way, but I wondered if there might be some other option that would allow me to point the planner in the right direction so it would behave the same as in the previous versions, namely 8.0? Any suggestions? Kind regards Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: > My current problem is that rewriting hundreds of queries, some of them > quite a bit more complex than this one, but all of them using the same > general scheme, would take quite a lot of time - and I'm expected to > hand over the survey results ASAP. So I will obviously have to do a > rewrite if there's just no other way, but I wondered if there might be > some other option that would allow me to point the planner in the right > direction so it would behave the same as in the previous versions, > namely 8.0? It looks like "set enable_nestloop = 0" might be a workable hack for the immediate need. Once you're not under deadline, I'd like to investigate more closely to find out why 8.1 does worse than 8.0 here. regards, tom lane
> -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Donnerstag, 1. Dezember 2005 17:26 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have > been much faster in PG<=8.0 > It looks like "set enable_nestloop = 0" might be a workable > hack for the immediate need. Whow - that works miracles :) "Sort (cost=81813.13..81813.14 rows=3 width=16) (actual time=7526.745..7526.751 rows=3 loops=1)" " Sort Key: source."position"" " -> HashAggregate (cost=81813.07..81813.11 rows=3 width=16) (actual time=7526.590..7526.601 rows=3 loops=1)" " -> Merge Join (cost=81811.40..81813.03 rows=5 width=16) (actual time=7423.289..7479.175 rows=9806 loops=1)" " Merge Cond: ("outer".id = "inner".value)" " -> Sort (cost=1.05..1.06 rows=3 width=14) (actual time=0.085..0.091 rows=3 loops=1)" " Sort Key: source.id" " -> Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=0.039..0.049 rows=3loops=1)" " -> Sort (cost=81810.35..81811.81 rows=583 width=8) (actual time=7423.179..7440.062 rows=9806 loops=1)" " Sort Key: mafo.answer.value" " -> Hash Join (cost=27164.31..81783.57 rows=583 width=8) (actual time=6757.521..7360.822 rows=9806loops=1)" " Hash Cond: ("outer".session_id = "inner".session_id)" " -> Bitmap Heap Scan on answer (cost=506.17..54677.92 rows=88334 width=8) (actual time=379.245..2660.344rows=162809 loops=1)" " Recheck Cond: (question_id = 16)" " -> Bitmap Index Scan on idx_answer_question_id (cost=0.00..506.17 rows=88334 width=0)(actual time=274.632..274.632 rows=162814 loops=1)" " Index Cond: (question_id = 16)" " -> Hash (cost=26655.21..26655.21 rows=1175 width=8) (actual time=3831.362..3831.362 rows=9806loops=1)" " -> Hash Join (cost=4829.33..26655.21 rows=1175 width=8) (actual time=542.227..3800.985rows=9806 loops=1)" " Hash Cond: ("outer".session_id = "inner".session_id)" " -> Bitmap Heap Scan on answer (cost=182.84..21429.34 rows=20641 width=4) (actualtime=292.067..2750.376 rows=165762 loops=1)" " Recheck Cond: ((question_id = 6) AND (value = 1))" " -> Bitmap Index Scan on idx02_performance (cost=0.00..182.84 rows=20641 width=0)(actual time=167.306..167.306 rows=165769 loops=1)" " Index Cond: ((question_id = 6) AND (value = 1))" " -> Hash (cost=4621.13..4621.13 rows=10141 width=4) (actual time=182.842..182.842rows=11134 loops=1)" " -> Index Scan using idx01_perf_0005 on participant (cost=0.01..4621.13 rows=10141width=4) (actual time=0.632..136.126 rows=11134 loops=1)" " Index Cond: (date_trunc('month'::text, created) = date_trunc('month'::text,(now() - '1 mon'::interval)))" " Filter: (status = 1)" "Total runtime: 7535.398 ms" > Once you're not under deadline, > I'd like to investigate more closely to find out why 8.1 does > worse than 8.0 here. Please tell me what I can do to help in clearing up this issue, I'd be very happy to help! Heck, I am happy anyway that there'ssuch a quick fix, even if it's not a beautiful one :) Kind regards Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: >> Once you're not under deadline, >> I'd like to investigate more closely to find out why 8.1 does >> worse than 8.0 here. > Please tell me what I can do to help in clearing up this issue, I'd be > very happy to help! The first thing to do is get 8.0's EXPLAIN ANALYZE for the same query. After we see how that differs from 8.1, we'll know what the next question should be ... regards, tom lane
Hi!
> -----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Gesendet: Donnerstag, 1. Dezember 2005 17:26
> An: Markus Wollny
> Cc: pgsql-performance@postgresql.org
> Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have
> been much faster in PG<=8.0
> It looks like "set enable_nestloop = 0" might be a workable
> hack for the immediate need.
>
> Once you're not under deadline,
> I'd like to investigate more closely to find out why 8.1 does
> worse than 8.0 here.
I've just set up a PostgreSQL 8.0.3 installation ...
select version();
version
--------------------------------------------------------------------------------------------
PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
(1 row)
...and restored a dump there; here's the explain analyze of the query for 8.0.3:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=5193.63..5193.63 rows=3 width=16) (actual time=7365.107..7365.110 rows=3 loops=1)
Sort Key: source."position"
-> HashAggregate (cost=5193.59..5193.60 rows=3 width=16) (actual time=7365.034..7365.041 rows=3 loops=1)
-> Nested Loop (cost=0.00..5193.57 rows=3 width=16) (actual time=3190.642..7300.820 rows=11086 loops=1)
-> Nested Loop (cost=0.00..3602.44 rows=4 width=20) (actual time=3169.968..5875.153 rows=11087 loops=1)
-> Nested Loop (cost=0.00..1077.95 rows=750 width=16) (actual time=36.599..2778.129 rows=158288 loops=1)
-> Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=6.503..6.514 rows=3 loops=1)
-> Index Scan using idx02_performance on answer (cost=0.00..355.85 rows=250 width=8) (actual time=10.071..732.746 rows=52763 loops=3)
Index Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))
-> Index Scan using pk_participant on participant (cost=0.00..3.35 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=158288)
Index Cond: (participant.session_id = "outer".session_id)
Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '2 mons'::interval))))
-> Index Scan using idx_answer_session_id on answer (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087)
Index Cond: ("outer".session_id = answer.session_id)
Filter: ((question_id = 6) AND (value = 1))
Total runtime: 7365.461 ms
(16 rows)
Does this tell you anything useful? It's not on the same machine, mind you, but configuration for PostgreSQL is absolutely identical (apart from the autovacuum-lines which 8.0.3 doesn't like).
Kind regards
Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: >> Once you're not under deadline, >> I'd like to investigate more closely to find out why 8.1 does >> worse than 8.0 here. > Does this tell you anything useful? It's not on the same machine, mind > you, but configuration for PostgreSQL is absolutely identical (apart > from the autovacuum-lines which 8.0.3 doesn't like). The data is not quite the same, right? I notice different numbers of rows being returned. But anyway, it seems the problem is with the upper scan on "answers", which 8.0 does like this: -> Index Scan using idx_answer_session_id on answer (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1loops=11087) Index Cond: ("outer".session_id = answer.session_id) Filter: ((question_id = 6) AND (value = 1)) and 8.1 does like this: -> Bitmap Heap Scan on answer (cost=185.85..187.26 rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806) Recheck Cond: (("outer".session_id = answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1)) -> BitmapAnd (cost=185.85..185.85 rows=1 width=0) (actual time=197.421..197.421 rows=0 loops=9806) -> Bitmap Index Scan on idx_answer_session_id (cost=0.00..2.83 rows=236 width=0) (actual time=0.109..0.109rows=49 loops=9806) Index Cond: ("outer".session_id = answer.session_id) -> Bitmap Index Scan on idx02_performance (cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742rows=165697 loops=9806) Index Cond: ((question_id = 6) AND (value = 1)) It seems that checking question_id/value via the index, rather than directly on the fetched tuple, is a net loss here. It looks like 8.1 would have made the right plan choice if it had made a better estimate of the combined selectivity of the question_id and value conditions, so ultimately this is another manifestation of the lack of cross-column statistics. What I find interesting though is that the plain index scan in 8.0 is so enormously cheaper than it's estimated to be. Perhaps the answer table in your 8.0 installation is almost perfectly ordered by session_id? Are you using default values for the planner cost parameters? It looks like reducing random_page_cost would help bring the planner estimates into line with reality on your machines. regards, tom lane
Hi! > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Sonntag, 4. Dezember 2005 19:32 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have > been much faster in PG<=8.0 > The data is not quite the same, right? I notice different > numbers of rows being returned. No, you're right, I didn't manage to restore the 8.1 dump into the 8.0.3 cluster, so I took the quick route and restoredthe last dump from my 8.0 installation. The numbers should be roughly within the same range, though: Table answer has got 8,646,320 rows (counted and estimated, as this db is not live, obviously), table participant has got173,998 rows; for comparison: The production db had an estimated 8,872,130, counted 8,876,648 rows for table answer, and estimated 178,165, counted 178,248rows for participant. As the numbers are a mere 2% apart, I should think that this wouldn't make that much difference. > It seems that checking question_id/value via the index, > rather than directly on the fetched tuple, is a net loss > here. It looks like 8.1 would have made the right plan > choice if it had made a better estimate of the combined > selectivity of the question_id and value conditions, so > ultimately this is another manifestation of the lack of > cross-column statistics. What I find interesting though is > that the plain index scan in 8.0 is so enormously cheaper > than it's estimated to be. Perhaps the answer table in your > 8.0 installation is almost perfectly ordered by session_id? Not quite - there may be several concurrent sessions at any one time, but ordinarily the answers for one session-id wouldbe quite close together, in a lot of cases even in perfect sequence, so "almost perfectly" might be a fair description,depending on the exact definition of "almost" :) > Are you using default values for the planner cost parameters? I have to admit that I did tune the random_page_cost and effective_cache_size settings ages ago (7.1-ish) to a value thatseemed to work best then - and didn't touch it ever since, although my data pool has grown quite a bit over time. cpu_tuple_cost,cpu_index_tuple_cost and cpu_operator_cost are using default values. > It looks like reducing random_page_cost would help bring the > planner estimates into line with reality on your machines. I had set random_page_cost to 1.4 already, so I doubt that it would do much good to further reduce the value - reading thedocs and the suggestions for tuning I would have thought that I should actually consider increasing this value a bit,as not all of my data will fit in memory any more. Do you nevertheless want me to try what happens if I reduce random_page_costeven further? Kind regards Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: >> ... What I find interesting though is >> that the plain index scan in 8.0 is so enormously cheaper >> than it's estimated to be. Perhaps the answer table in your >> 8.0 installation is almost perfectly ordered by session_id? > Not quite - there may be several concurrent sessions at any one time, but ordinarily the answers for one session-id wouldbe quite close together, in a lot of cases even in perfect sequence, so "almost perfectly" might be a fair description,depending on the exact definition of "almost" :) Could we see the pg_stats row for answer.session_id in both 8.0 and 8.1? > I had set random_page_cost to 1.4 already, so I doubt that it would do much good to further reduce the value - readingthe docs and the suggestions for tuning I would have thought that I should actually consider increasing this valuea bit, as not all of my data will fit in memory any more. Do you nevertheless want me to try what happens if I reducerandom_page_cost even further? No, that's probably quite low enough already ... regards, tom lane
> -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Montag, 5. Dezember 2005 15:33 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: AW: [PERFORM] Queries taking ages in PG 8.1, > have been much faster in PG<=8.0 > Could we see the pg_stats row for answer.session_id in both > 8.0 and 8.1? Here you are: select null_frac , avg_width , n_distinct , most_common_vals , most_common_freqs , histogram_bounds , Correlation from pg_stats where schemaname = 'survey' and tablename = 'answer' and attname = 'session_id'; 8.1: null_frac 0 avg_width 4 n_distinct 33513 most_common_vals {1013854,1017890,1021551,1098817,764249,766938,776353,780954,782232,785985} most_common_freqs {0.001,0.001,0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667} histogram_bounds {757532,819803,874935,938170,1014421,1081507,1164659,1237281,1288267,1331016,1368939} Correlation -0.0736492 8.0.3: null_frac 0 avg_width 4 n_distinct 29287 most_common_vals {765411,931762,983933,1180453,1181959,1229963,1280249,1288736,1314970,764901} most_common_freqs {0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.000666667} histogram_bounds {757339,822949,875834,939085,1004782,1065251,1140682,1218336,1270024,1312170,1353082} Correlation -0.237136 Kind regards Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: >> Could we see the pg_stats row for answer.session_id in both >> 8.0 and 8.1? > Here you are: > 8.1: > Correlation -0.0736492 > 8.0.3: > Correlation -0.237136 Interesting --- if the 8.1 database is a dump and restore of the 8.0, you'd expect the physical ordering to be similar. Why is 8.1 showing a significantly lower correlation? That has considerable impact on the estimated cost of an indexscan (plain not bitmap), and so it might explain why 8.1 is mistakenly avoiding the indexscan ... regards, tom lane
> -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Montag, 5. Dezember 2005 16:12 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: AW: AW: [PERFORM] Queries taking ages in PG 8.1, > have been much faster in PG<=8.0 > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > >> Could we see the pg_stats row for answer.session_id in > both 8.0 and > >> 8.1? > > > Here you are: > > > 8.1: > > Correlation -0.0736492 > > > 8.0.3: > > Correlation -0.237136 > > Interesting --- if the 8.1 database is a dump and restore of > the 8.0, you'd expect the physical ordering to be similar. I dumped the data from my 8.0.1 cluster on 2005-11-18 00:23 using pg_dumpall with no further options; the dump was passedthrough iconv to clear up some UTF-8 encoding issues, then restored into a fresh 8.1 cluster where it went productive;I used the very same dump to restore the 8.0.3 cluster. So there is a difference between the two datasets, anadditional 230.328 rows in the answers-table. > Why is 8.1 showing a significantly lower correlation? That > has considerable impact on the estimated cost of an indexscan > (plain not bitmap), and so it might explain why 8.1 is > mistakenly avoiding the indexscan ... I just ran a vacuum analyze on the table, just to make sure that the stats are up to date (forgot that on the previous run,thanks to pg_autovacuum...), and the current correlation on the 8.1 installation is now calculated as -0.158921. That'sstill more than twice the value as for the 8.0-db. I don't know whether that is significant, though. Kind regards Markus
> -----Ursprüngliche Nachricht----- > Von: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] Im Auftrag > von Markus Wollny > Gesendet: Montag, 5. Dezember 2005 16:41 > An: Tom Lane > Cc: pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have > been much faster in PG<=8.0 > an additional 230.328 rows in the answers-table. That was supposed to read 230,328 rows, sorry.
On Mon, 05 Dec 2005 10:11:41 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Correlation -0.0736492 >> Correlation -0.237136 >That has considerable impact on the >estimated cost of an indexscan The cost estimator uses correlationsquared. So all correlations between -0.3 and +0.3 can be considered equal under the assumption that estimation errors of up to 10% are acceptable. Servus Manfred