Thread: Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1

Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1

From

Jona

Date:

05 May 2005, 06:48:59

Hi

I'm currently experiencing problems with long query execution times.
What I believe makes these problems particularly interesting is the
difference in execution plans between our test server running PostGreSQL
7.3.6 and our production server running PostGreSQL 7.3.9.
The test server is an upgraded "home machine", a Pentium 4 with 1GB of
memory and IDE disk.
The production server is a dual CPU XEON Pentium 4 with 2GB memory and
SCSI disks.
One should expect the production server to be faster, but appearently
not as the outlined query plans below shows.

My questions can be summoned up to:
1) How come the query plans between the 2 servers are different?
2) How come the production server in general estimates the cost of the
query plans so horribly wrong? (ie. it chooses a bad query plan where as
the test server chooses a good plan)
3) In Query 2, how come the production server refuses the use its
indexes (subcat_uq and aff_price_uq, both unique indexes) where as the
test server determines that the indexes are the way to go
4) In Query 3, how come the test server refuses to use its index
(sct2lang_uq) and the production server uses it? And why is the test
server still faster eventhough it makes a sequential scan of a table
with 8.5k records in?

Please note, a VACUUM ANALYSE is run on the production server once a day
(used to be once an hour but it seemed to make no difference), however
there are generally no writes to the tables used in the queries.

If anyone could shed some light on these issues I would truly appreciate
it.


Cheers
Jona

PS. Please refer to part 2 for the other queries and query plans

----------------------------------------------------------------------------------------------------

Query 1:
EXPLAIN ANALYZE
SELECT DISTINCT StatConTrans_Tbl.id, Code_Tbl.sysnm AS code,
PriceCat_Tbl.amount AS price, Country_Tbl.currency,
  CreditsCat_Tbl.amount AS credits, Info_Tbl.title, Info_Tbl.description
FROM (SCT2SubCatType_Tbl
INNER JOIN SCT2Lang_Tbl ON SCT2SubCatType_Tbl.sctid = SCT2Lang_Tbl.sctid
INNER JOIN Language_Tbl ON SCT2Lang_Tbl.langid = Language_Tbl.id AND
Language_Tbl.sysnm = UPPER('us') AND Language_Tbl.enabled = true
INNER JOIN Info_Tbl ON SCT2SubCatType_Tbl.sctid = Info_Tbl.sctid AND
Language_Tbl.id = Info_Tbl.langid
INNER JOIN SubCatType_Tbl ON SCT2SubCatType_Tbl.subcattpid =
SubCatType_Tbl.id AND SubCatType_Tbl.enabled = true
INNER JOIN CatType_Tbl ON SubCatType_Tbl.cattpid = CatType_Tbl.id AND
CatType_Tbl.enabled = true
INNER JOIN SuperCatType_Tbl ON CatType_Tbl.spcattpid =
SuperCatType_Tbl.id AND SuperCatType_Tbl.enabled = true
INNER JOIN StatConTrans_Tbl ON SCT2SubCatType_Tbl.sctid =
StatConTrans_Tbl.id AND StatConTrans_Tbl.enabled = true
INNER JOIN Price_Tbl ON StatConTrans_Tbl.id = Price_Tbl.sctid AND
Price_Tbl.affid = 8
INNER JOIN PriceCat_Tbl ON Price_Tbl.prccatid = PriceCat_Tbl.id AND
PriceCat_Tbl.enabled = true
INNER JOIN Country_Tbl ON PriceCat_Tbl.cntid = Country_Tbl.id AND
Country_Tbl.enabled = true
INNER JOIN CreditsCat_Tbl ON Price_Tbl.crdcatid = CreditsCat_Tbl.id AND
CreditsCat_Tbl.enabled = true
INNER JOIN StatCon_Tbl ON StatConTrans_Tbl.id = StatCon_Tbl.sctid AND
StatCon_Tbl.ctpid = 1
INNER JOIN Code_Tbl ON SuperCatType_Tbl.id = Code_Tbl.spcattpid AND
Code_Tbl.affid = 8 AND Code_Tbl.cdtpid = 1)
WHERE SCT2SubCatType_Tbl.subcattpid = 79
ORDER BY StatConTrans_Tbl.id DESC
LIMIT 8 OFFSET 0

Plan on PostGre 7.3.6 on Red Hat Linux 3.2.3-39
"Limit  (cost=178.59..178.61 rows=1 width=330) (actual time=22.77..28.51
rows=4 loops=1)"
"  ->  Unique  (cost=178.59..178.61 rows=1 width=330) (actual
time=22.77..28.50 rows=4 loops=1)"
"        ->  Sort  (cost=178.59..178.60 rows=1 width=330) (actual
time=22.76..22.85 rows=156 loops=1)"
"              Sort Key: statcontrans_tbl.id, code_tbl.sysnm,
pricecat_tbl.amount, country_tbl.currency, creditscat_tbl.amount,
info_tbl.title, info_tbl.description"
"              ->  Hash Join  (cost=171.19..178.58 rows=1 width=330)
(actual time=3.39..6.55 rows=156 loops=1)"
"                    Hash Cond: ("outer".cntid = "inner".id)"
"                    ->  Nested Loop  (cost=170.13..177.51 rows=1
width=312) (actual time=3.27..5.75 rows=156 loops=1)"
"                          Join Filter: ("inner".sctid = "outer".sctid)"
"                          ->  Hash Join  (cost=170.13..171.48 rows=1
width=308) (actual time=3.12..3.26 rows=4 loops=1)"
"                                Hash Cond: ("outer".crdcatid =
"inner".id)"
"                                ->  Hash Join  (cost=169.03..170.38
rows=1 width=300) (actual time=3.00..3.11 rows=4 loops=1)"
"                                      Hash Cond: ("outer".spcattpid =
"inner".spcattpid)"
"                                      ->  Hash Join
(cost=167.22..168.56 rows=1 width=253) (actual time=2.88..2.97 rows=4
loops=1)"
"                                            Hash Cond: ("outer".id =
"inner".prccatid)"
"                                            ->  Seq Scan on
pricecat_tbl  (cost=0.00..1.29 rows=12 width=12) (actual time=0.04..0.08
rows=23 loops=1)"
"                                                  Filter: (enabled =
true)"
"                                            ->  Hash
(cost=167.21..167.21 rows=1 width=241) (actual time=2.80..2.80 rows=0
loops=1)"
"                                                  ->  Nested Loop
(cost=3.77..167.21 rows=1 width=241) (actual time=1.31..2.79 rows=4
loops=1)"
"                                                        Join Filter:
("inner".sctid = "outer".sctid)"
"                                                        ->  Nested
Loop  (cost=3.77..161.19 rows=1 width=229) (actual time=1.19..2.60
rows=4 loops=1)"
"                                                              Join
Filter: ("outer".sctid = "inner".sctid)"
"                                                              ->  Hash
Join  (cost=3.77..155.17 rows=1 width=44) (actual time=1.07..2.37 rows=4
loops=1)"
"
Hash Cond: ("outer".langid = "inner".id)"
"                                                                    ->
Nested Loop  (cost=2.69..154.06 rows=7 width=40) (actual time=0.90..2.18
rows=8 loops=1)"
"
Join Filter: ("outer".sctid = "inner".sctid)"
"
->  Nested Loop  (cost=2.69..21.30 rows=1 width=32) (actual
time=0.78..1.94 rows=4 loops=1)"
"
->  Nested Loop  (cost=2.69..15.30 rows=1 width=28) (actual
time=0.66..1.76 rows=4 loops=1)"
"
->  Hash Join  (cost=2.69..7.07 rows=1 width=20) (actual time=0.39..1.15
rows=154 loops=1)"
"
Hash Cond: ("outer".cattpid = "inner".id)"
"
->  Seq Scan on subcattype_tbl  (cost=0.00..3.98 rows=79 width=8)
(actual time=0.03..0.35 rows=156 loops=1)"
"
Filter: (enabled = true)"
"
->  Hash  (cost=2.68..2.68 rows=3 width=12) (actual time=0.31..0.31
rows=0 loops=1)"
"
->  Hash Join  (cost=1.15..2.68 rows=3 width=12) (actual time=0.16..0.27
rows=31 loops=1)"
"
Hash Cond: ("outer".spcattpid = "inner".id)"
"
->  Seq Scan on cattype_tbl  (cost=0.00..1.41 rows=16 width=8) (actual
time=0.04..0.09 rows=31 loops=1)"
"
Filter: (enabled = true)"
"
->  Hash  (cost=1.14..1.14 rows=6 width=4) (actual time=0.06..0.06
rows=0 loops=1)"
"
->  Seq Scan on supercattype_tbl  (cost=0.00..1.14 rows=6 width=4)
(actual time=0.03..0.05 rows=10 loops=1)"
"
Filter: (enabled = true)"
"
->  Index Scan using subcat_uq on sct2subcattype_tbl  (cost=0.00..5.97
rows=1 width=8) (actual time=0.00..0.00 rows=0 loops=154)"
"
Index Cond: ((sct2subcattype_tbl.subcattpid = "outer".id) AND
(sct2subcattype_tbl.subcattpid = 79))"
"
->  Index Scan using statcontrans_pk on statcontrans_tbl
(cost=0.00..5.99 rows=1 width=4) (actual time=0.04..0.04 rows=1 loops=4)"
"
Index Cond: ("outer".sctid = statcontrans_tbl.id)"
"
Filter: (enabled = true)"
"
->  Index Scan using sct2lang_uq on sct2lang_tbl  (cost=0.00..132.22
rows=43 width=8) (actual time=0.04..0.05 rows=2 loops=4)"
"
Index Cond: ("outer".id = sct2lang_tbl.sctid)"
"                                                                    ->
Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=0.11..0.11 rows=0
loops=1)"
"
->  Seq Scan on language_tbl  (cost=0.00..1.07 rows=1 width=4) (actual
time=0.10..0.11 rows=1 loops=1)"
"
Filter: (((sysnm)::text = 'US'::text) AND (enabled = true))"
"                                                              ->  Index
Scan using info_uq on info_tbl  (cost=0.00..6.00 rows=1 width=185)
(actual time=0.05..0.05 rows=1 loops=4)"
"
Index Cond: ((info_tbl.sctid = "outer".sctid) AND (info_tbl.langid =
"outer".langid))"
"                                                        ->  Index Scan
using aff_price_uq on price_tbl  (cost=0.00..6.01 rows=1 width=12)
(actual time=0.03..0.03 rows=1 loops=4)"
"                                                              Index
Cond: ((price_tbl.affid = 8) AND (price_tbl.sctid = "outer".sctid))"
"                                      ->  Hash  (cost=1.81..1.81 rows=1
width=47) (actual time=0.08..0.08 rows=0 loops=1)"
"                                            ->  Seq Scan on code_tbl
(cost=0.00..1.81 rows=1 width=47) (actual time=0.04..0.07 rows=5 loops=1)"
"                                                  Filter: ((affid = 8)
AND (cdtpid = 1))"
"                                ->  Hash  (cost=1.09..1.09 rows=4
width=8) (actual time=0.06..0.06 rows=0 loops=1)"
"                                      ->  Seq Scan on creditscat_tbl
(cost=0.00..1.09 rows=4 width=8) (actual time=0.03..0.04 rows=7 loops=1)"
"                                            Filter: (enabled = true)"
"                          ->  Index Scan using ctp_statcon on
statcon_tbl  (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31
rows=39 loops=4)"
"                                Index Cond: ((statcon_tbl.sctid =
"outer".sctid) AND (statcon_tbl.ctpid = 1))"
"                    ->  Hash  (cost=1.06..1.06 rows=2 width=18) (actual
time=0.06..0.06 rows=0 loops=1)"
"                          ->  Seq Scan on country_tbl  (cost=0.00..1.06
rows=2 width=18) (actual time=0.04..0.05 rows=4 loops=1)"
"                                Filter: (enabled = true)"
"Total runtime: 29.56 msec"

Plan on PostGre 7.3.9 on Red Hat Linux 3.2.3-49
"Limit  (cost=545.53..545.60 rows=1 width=135) (actual
time=1251.71..1261.25 rows=4 loops=1)"
"  ->  Unique  (cost=545.53..545.60 rows=1 width=135) (actual
time=1251.71..1261.24 rows=4 loops=1)"
"        ->  Sort  (cost=545.53..545.54 rows=4 width=135) (actual
time=1251.70..1251.90 rows=156 loops=1)"
"              Sort Key: statcontrans_tbl.id, code_tbl.sysnm,
pricecat_tbl.amount, country_tbl.currency, creditscat_tbl.amount,
info_tbl.title, info_tbl.description"
"              ->  Nested Loop  (cost=485.61..545.49 rows=4 width=135)
(actual time=603.77..1230.96 rows=156 loops=1)"
"                    Join Filter: ("inner".sctid = "outer".sctid)"
"                    ->  Hash Join  (cost=485.61..486.06 rows=3
width=131) (actual time=541.87..542.22 rows=4 loops=1)"
"                          Hash Cond: ("outer".crdcatid = "inner".id)"
"                          ->  Hash Join  (cost=484.51..484.90 rows=3
width=123) (actual time=529.09..529.36 rows=4 loops=1)"
"                                Hash Cond: ("outer".spcattpid =
"inner".spcattpid)"
"                                ->  Hash Join  (cost=482.68..482.93
rows=3 width=114) (actual time=517.60..517.77 rows=4 loops=1)"
"                                      Hash Cond: ("outer".cntid =
"inner".id)"
"                                      ->  Merge Join
(cost=481.60..481.80 rows=4 width=105) (actual time=517.36..517.43
rows=4 loops=1)"
"                                            Merge Cond: ("outer".id =
"inner".prccatid)"
"                                            ->  Sort  (cost=1.81..1.87
rows=23 width=12) (actual time=8.44..8.45 rows=6 loops=1)"
"                                                  Sort Key:
pricecat_tbl.id"
"                                                  ->  Seq Scan on
pricecat_tbl  (cost=0.00..1.29 rows=23 width=12) (actual time=8.31..8.37
rows=23 loops=1)"
"                                                        Filter:
(enabled = true)"
"                                            ->  Sort
(cost=479.80..479.81 rows=4 width=93) (actual time=508.87..508.87 rows=4
loops=1)"
"                                                  Sort Key:
price_tbl.prccatid"
"                                                  ->  Nested Loop
(cost=13.69..479.75 rows=4 width=93) (actual time=444.70..508.81 rows=4
loops=1)"
"                                                        Join Filter:
("inner".sctid = "outer".sctid)"
"                                                        ->  Nested
Loop  (cost=13.69..427.04 rows=9 width=81) (actual time=444.60..508.62
rows=4 loops=1)"
"                                                              Join
Filter: ("outer".sctid = "inner".sctid)"
"                                                              ->
Nested Loop  (cost=13.69..377.03 rows=8 width=44) (actual
time=345.13..398.38 rows=4 loops=1)"
"
Join Filter: ("outer".sctid = "inner".id)"
"                                                                    ->
Hash Join  (cost=13.69..327.32 rows=8 width=40) (actual
time=219.17..272.27 rows=4 loops=1)"
"
Hash Cond: ("outer".langid = "inner".id)"
"
->  Nested Loop  (cost=12.61..325.92 rows=42 width=36) (actual
time=209.77..262.79 rows=8 loops=1)"
"
->  Hash Join  (cost=12.61..106.32 rows=27 width=28) (actual
time=101.88..102.00 rows=4 loops=1)"
"
Hash Cond: ("outer".cattpid = "inner".id)"
"
->  Hash Join  (cost=9.47..102.68 rows=33 width=16) (actual
time=84.14..84.21 rows=4 loops=1)"
"
Hash Cond: ("outer".subcattpid = "inner".id)"
"
->  Index Scan using subcat_uq on sct2subcattype_tbl  (cost=0.00..92.56
rows=33 width=8) (actual time=83.33..83.37 rows=4 loops=1)"
"
Index Cond: (subcattpid = 79)"
"
->  Hash  (cost=3.98..3.98 rows=156 width=8) (actual time=0.76..0.76
rows=0 loops=1)"
"
->  Seq Scan on subcattype_tbl  (cost=0.00..3.98 rows=156 width=8)
(actual time=0.03..0.49 rows=156 loops=1)"
"
Filter: (enabled = true)"
"
->  Hash  (cost=3.07..3.07 rows=27 width=12) (actual time=17.58..17.58
rows=0 loops=1)"
"
->  Hash Join  (cost=1.16..3.07 rows=27 width=12) (actual
time=17.30..17.52 rows=31 loops=1)"
"
Hash Cond: ("outer".spcattpid = "inner".id)"
"
->  Seq Scan on cattype_tbl  (cost=0.00..1.41 rows=31 width=8) (actual
time=0.02..0.12 rows=31 loops=1)"
"
Filter: (enabled = true)"
"
->  Hash  (cost=1.14..1.14 rows=10 width=4) (actual time=17.09..17.09
rows=0 loops=1)"
"
->  Seq Scan on supercattype_tbl  (cost=0.00..1.14 rows=10 width=4)
(actual time=17.05..17.07 rows=10 loops=1)"
"
Filter: (enabled = true)"
"
->  Index Scan using sct2lang_uq on sct2lang_tbl  (cost=0.00..8.13
rows=2 width=8) (actual time=26.97..40.18 rows=2 loops=4)"
"
Index Cond: ("outer".sctid = sct2lang_tbl.sctid)"
"
->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=9.04..9.04
rows=0 loops=1)"
"
->  Seq Scan on language_tbl  (cost=0.00..1.07 rows=1 width=4) (actual
time=9.02..9.03 rows=1 loops=1)"
"
Filter: (((sysnm)::text = 'US'::text) AND (enabled = true))"
"                                                                    ->
Index Scan using statcontrans_pk on statcontrans_tbl  (cost=0.00..5.88
rows=1 width=4) (actual time=31.51..31.52 rows=1 loops=4)"
"
Index Cond: (statcontrans_tbl.id = "outer".sctid)"
"
Filter: (enabled = true)"
"                                                              ->  Index
Scan using info_uq on info_tbl  (cost=0.00..5.93 rows=1 width=37)
(actual time=27.54..27.54 rows=1 loops=4)"
"
Index Cond: ((info_tbl.sctid = "outer".sctid) AND (info_tbl.langid =
"outer".langid))"
"                                                        ->  Index Scan
using aff_price_uq on price_tbl  (cost=0.00..5.88 rows=1 width=12)
(actual time=0.03..0.03 rows=1 loops=4)"
"                                                              Index
Cond: ((price_tbl.affid = 8) AND (price_tbl.sctid = "outer".sctid))"
"                                      ->  Hash  (cost=1.06..1.06 rows=4
width=9) (actual time=0.05..0.05 rows=0 loops=1)"
"                                            ->  Seq Scan on
country_tbl  (cost=0.00..1.06 rows=4 width=9) (actual time=0.02..0.03
rows=4 loops=1)"
"                                                  Filter: (enabled =
true)"
"                                ->  Hash  (cost=1.81..1.81 rows=8
width=9) (actual time=11.31..11.31 rows=0 loops=1)"
"                                      ->  Seq Scan on code_tbl
(cost=0.00..1.81 rows=8 width=9) (actual time=11.24..11.29 rows=5 loops=1)"
"                                            Filter: ((affid = 8) AND
(cdtpid = 1))"
"                          ->  Hash  (cost=1.09..1.09 rows=7 width=8)
(actual time=12.59..12.59 rows=0 loops=1)"
"                                ->  Seq Scan on creditscat_tbl
(cost=0.00..1.09 rows=7 width=8) (actual time=12.55..12.57 rows=7 loops=1)"
"                                      Filter: (enabled = true)"
"                    ->  Index Scan using ctp_statcon on statcon_tbl
(cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39
loops=4)"
"                          Index Cond: ((statcon_tbl.sctid =
"outer".sctid) AND (statcon_tbl.ctpid = 1))"
"Total runtime: 1299.02 msec"

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1

From

Tom Lane

Date:

05 May 2005, 13:02:49

Jona <jonanews@oismail.com> writes:
> I'm currently experiencing problems with long query execution times.
> What I believe makes these problems particularly interesting is the
> difference in execution plans between our test server running PostGreSQL
> 7.3.6 and our production server running PostGreSQL 7.3.9.
> The test server is an upgraded "home machine", a Pentium 4 with 1GB of
> memory and IDE disk.
> The production server is a dual CPU XEON Pentium 4 with 2GB memory and
> SCSI disks.
> One should expect the production server to be faster, but appearently
> not as the outlined query plans below shows.

I think the plans are fine; it looks to me like the production server
has serious table-bloat or index-bloat problems, probably because of
inadequate vacuuming.  For instance compare these entries:

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31 rows=39
loops=4)
      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39
loops=4)
      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

Appears to be exactly the same task ... but the test server spent
1.24 msec total while the production server spent 687.36 msec total.
That's more than half of your problem right there.  Some of the other
scans seem a lot slower on the production machine too.

> 1) How come the query plans between the 2 servers are different?

The production server's rowcount estimates are pretty good, the test
server's are not.  How long since you vacuumed/analyzed the test server?

It'd be interesting to see the output of "vacuum verbose statcon_tbl"
on both servers ...

            regards, tom lane

PS: if you post any more query plans, please try to use software that
doesn't mangle the formatting so horribly ...

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Jona

Date:

06 May 2005, 04:25:17

Thank you for the swift reply.
The test server is hardly ever vacuumed as it in general sees very limited traffic. vacuum is only necessary if the server sees a lot of write operations, i.e. update, delete, insert right?

What explains the different choice of query plans then?
As can be seen from the following snippets the test server decides to use an index twice in Query 2, where as the live server decides to do a full scan of tables with 38.5k and 5.5k records.
In Query 3 it's vice versa.
Seems strange to me...

Query 2:
------------------- Bad idea, price_tbl hold 38.5k records
Test:
-> Index Scan using aff_price_uq on price_tbl (cost=0.00..6.01 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=2838)"
Live:
-> Seq Scan on price_tbl (cost=0.00..883.48 rows=2434 width=4) (actual time=0.86..67.25 rows=4570 loops=1)"
Filter: (affid = 8)"

------------------- Bad idea, sct2subcattype_tbl hold 5.5k records
Test:
-> Index Scan using subcat_uq on sct2subcattype_tbl (cost=0.00..79.26 rows=26 width=8) (actual time=0.01..0.17 rows=59 loops=48)
Live:
-> Seq Scan on sct2subcattype_tbl (cost=0.00..99.26 rows=5526 width=8) (actual time=0.01..30.16 rows=5526 loops=1)"

Query 3:
----------------- Bad idea, sct2lang_tbl has 8.6k records
Test:
-> Seq Scan on sct2lang_tbl (cost=0.00..150.79 rows=8679 width=8) (actual time=0.03..10.70 rows=8679 loops=1)"
Live:
-> Index Scan using sct2lang_uq on sct2lang_tbl (cost=0.00..8.13 rows=2 width=8) (actual time=1.10..2.39 rows=2 loops=69)"

Will get a VACUUM VERBOSE of StatCon_Tbl

Cheers
Jona

PS: The query plans are extracted using pgAdmin on Windows, if you can recommend a better cross-platform postgre client I'd be happy to try it out.

Tom Lane wrote:

Jona <jonanews@oismail.com> writes:

I'm currently experiencing problems with long query execution times.
What I believe makes these problems particularly interesting is the 
difference in execution plans between our test server running PostGreSQL 
7.3.6 and our production server running PostGreSQL 7.3.9.
The test server is an upgraded "home machine", a Pentium 4 with 1GB of 
memory and IDE disk.
The production server is a dual CPU XEON Pentium 4 with 2GB memory and 
SCSI disks.
One should expect the production server to be faster, but appearently 
not as the outlined query plans below shows.

I think the plans are fine; it looks to me like the production server
has serious table-bloat or index-bloat problems, probably because of
inadequate vacuuming.  For instance compare these entries:

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31 rows=39 loops=4)     Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39 loops=4)     Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

Appears to be exactly the same task ... but the test server spent
1.24 msec total while the production server spent 687.36 msec total.
That's more than half of your problem right there.  Some of the other
scans seem a lot slower on the production machine too.

1) How come the query plans between the 2 servers are different?

The production server's rowcount estimates are pretty good, the test
server's are not.  How long since you vacuumed/analyzed the test server?

It'd be interesting to see the output of "vacuum verbose statcon_tbl"
on both servers ...
		regards, tom lane

PS: if you post any more query plans, please try to use software that
doesn't mangle the formatting so horribly ...

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Jona

Date:

06 May 2005, 04:31:30

Results of VACUUM VERBOSE from both servers

Test server:
comm=# VACUUM VERBOSE StatCon_Tbl;
INFO: --Relation public.statcon_tbl--
INFO: Pages 338: Changed 338, Empty 0; Tup 11494: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.02s/0.00u sec elapsed 0.04 sec.
INFO: --Relation pg_toast.pg_toast_179851--
INFO: Pages 85680: Changed 85680, Empty 0; Tup 343321: Vac 0, Keep 0, UnUsed 0.
        Total CPU 4.03s/0.40u sec elapsed 70.99 sec.
VACUUM

Live Server:
comm=# VACUUM VERBOSE StatCon_Tbl;
INFO: --Relation public.statcon_tbl--
INFO: Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0, UnUsed 6101.
        Total CPU 0.01s/0.00u sec elapsed 0.60 sec.
INFO: --Relation pg_toast.pg_toast_891830--
INFO: Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0, UnUsed 5487.
        Total CPU 4.44s/0.34u sec elapsed 35.48 sec.
VACUUM

Cheers
Jona

Tom Lane wrote:

Jona <jonanews@oismail.com> writes:

I'm currently experiencing problems with long query execution times.
What I believe makes these problems particularly interesting is the 
difference in execution plans between our test server running PostGreSQL 
7.3.6 and our production server running PostGreSQL 7.3.9.
The test server is an upgraded "home machine", a Pentium 4 with 1GB of 
memory and IDE disk.
The production server is a dual CPU XEON Pentium 4 with 2GB memory and 
SCSI disks.
One should expect the production server to be faster, but appearently 
not as the outlined query plans below shows.

I think the plans are fine; it looks to me like the production server
has serious table-bloat or index-bloat problems, probably because of
inadequate vacuuming.  For instance compare these entries:

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31 rows=39 loops=4)     Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39 loops=4)     Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))

Appears to be exactly the same task ... but the test server spent
1.24 msec total while the production server spent 687.36 msec total.
That's more than half of your problem right there.  Some of the other
scans seem a lot slower on the production machine too.

1) How come the query plans between the 2 servers are different?

The production server's rowcount estimates are pretty good, the test
server's are not.  How long since you vacuumed/analyzed the test server?

It'd be interesting to see the output of "vacuum verbose statcon_tbl"
on both servers ...
		regards, tom lane

PS: if you post any more query plans, please try to use software that
doesn't mangle the formatting so horribly ...

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Christopher Kings-Lynne

Date:

06 May 2005, 05:37:39

You didn't do analyze.

Chris

Jona wrote:
>   Results of VACUUM VERBOSE from both servers
>
> Test server:
> comm=# VACUUM VERBOSE StatCon_Tbl;
> INFO:  --Relation public.statcon_tbl--
> INFO:  Pages 338: Changed 338, Empty 0; Tup 11494: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.02s/0.00u sec elapsed 0.04 sec.
> INFO:  --Relation pg_toast.pg_toast_179851--
> INFO:  Pages 85680: Changed 85680, Empty 0; Tup 343321: Vac 0, Keep 0,
> UnUsed 0.
>         Total CPU 4.03s/0.40u sec elapsed 70.99 sec.
> VACUUM
>
> Live Server:
> comm=# VACUUM VERBOSE StatCon_Tbl;
> INFO:  --Relation public.statcon_tbl--
> INFO:  Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0, UnUsed 6101.
>         Total CPU 0.01s/0.00u sec elapsed 0.60 sec.
> INFO:  --Relation pg_toast.pg_toast_891830--
> INFO:  Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0,
> UnUsed 5487.
>         Total CPU 4.44s/0.34u sec elapsed 35.48 sec.
> VACUUM
>
> Cheers
> Jona
>
> Tom Lane wrote:
>
>>Jona <jonanews@oismail.com> <mailto:jonanews@oismail.com> writes:
>>
>>
>>>I'm currently experiencing problems with long query execution times.
>>>What I believe makes these problems particularly interesting is the
>>>difference in execution plans between our test server running PostGreSQL
>>>7.3.6 and our production server running PostGreSQL 7.3.9.
>>>The test server is an upgraded "home machine", a Pentium 4 with 1GB of
>>>memory and IDE disk.
>>>The production server is a dual CPU XEON Pentium 4 with 2GB memory and
>>>SCSI disks.
>>>One should expect the production server to be faster, but appearently
>>>not as the outlined query plans below shows.
>>>
>>>
>>I think the plans are fine; it looks to me like the production server
>>has serious table-bloat or index-bloat problems, probably because of
>>inadequate vacuuming.  For instance compare these entries:
>>
>>->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31 rows=39
loops=4)
>>      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))
>>
>>->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39
loops=4)
>>      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1))
>>
>>Appears to be exactly the same task ... but the test server spent
>>1.24 msec total while the production server spent 687.36 msec total.
>>That's more than half of your problem right there.  Some of the other
>>scans seem a lot slower on the production machine too.
>>
>>
>>
>>>1) How come the query plans between the 2 servers are different?
>>>
>>>
>>The production server's rowcount estimates are pretty good, the test
>>server's are not.  How long since you vacuumed/analyzed the test server?
>>
>>It'd be interesting to see the output of "vacuum verbose statcon_tbl"
>>on both servers ...
>>
>>            regards, tom lane
>>
>>PS: if you post any more query plans, please try to use software that
>>doesn't mangle the formatting so horribly ...
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 4: Don't 'kill -9' the postmaster
>>
>>

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Jona

Date:

06 May 2005, 08:12:16

Now with analyze

Test Server:
comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
INFO:  --Relation public.statcon_tbl--
INFO:  Pages 338: Changed 0, Empty 0; Tup 11494: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.02s/0.00u sec elapsed 1.98 sec.
INFO:  --Relation pg_toast.pg_toast_179851--
INFO:  Pages 85680: Changed 0, Empty 0; Tup 343321: Vac 0, Keep 0, UnUsed 0.
        Total CPU 1.75s/0.23u sec elapsed 30.36 sec.
INFO:  Analyzing public.statcon_tbl
VACUUM

Live Server:
comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
INFO:  --Relation public.statcon_tbl--
INFO:  Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0, UnUsed 6101.
        Total CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  --Relation pg_toast.pg_toast_891830--
INFO:  Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0,
UnUsed 5487.
        Total CPU 3.21s/0.47u sec elapsed 18.03 sec.
INFO:  Analyzing public.statcon_tbl
VACUUM

Have done some sampling running the same query a few times through the
past few hours and it appears that the VACUUM has helped.
The following are the results after the Vacuum:

After VACUUM VERBOSE:
Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..21.29 rows=5
width=4) (actual time=0.07..0.37 rows=39 loops=4)
Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid
= 1))

After VACUUM ANALYZE VERBOSE:
Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.03 rows=5
width=4) (actual time=0.09..0.37 rows=39 loops=4)
Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid
= 1))

Only question remains why one server uses its indexes and the other
don't eventhough VACUUM ANALYZE has now been run on both servers?
And even more interesting, before the VACUUM ANALYZEit was the server
where no vacuum had taken place that used its index.

Cheers
Jona

Christopher Kings-Lynne wrote:

> You didn't do analyze.
>
> Chris
>
> Jona wrote:
>
>>   Results of VACUUM VERBOSE from both servers
>>
>> Test server:
>> comm=# VACUUM VERBOSE StatCon_Tbl;
>> INFO:  --Relation public.statcon_tbl--
>> INFO:  Pages 338: Changed 338, Empty 0; Tup 11494: Vac 0, Keep 0,
>> UnUsed 0.
>>         Total CPU 0.02s/0.00u sec elapsed 0.04 sec.
>> INFO:  --Relation pg_toast.pg_toast_179851--
>> INFO:  Pages 85680: Changed 85680, Empty 0; Tup 343321: Vac 0, Keep
>> 0, UnUsed 0.
>>         Total CPU 4.03s/0.40u sec elapsed 70.99 sec.
>> VACUUM
>>
>> Live Server:
>> comm=# VACUUM VERBOSE StatCon_Tbl;
>> INFO:  --Relation public.statcon_tbl--
>> INFO:  Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0,
>> UnUsed 6101.
>>         Total CPU 0.01s/0.00u sec elapsed 0.60 sec.
>> INFO:  --Relation pg_toast.pg_toast_891830--
>> INFO:  Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0,
>> UnUsed 5487.
>>         Total CPU 4.44s/0.34u sec elapsed 35.48 sec.
>> VACUUM
>>
>> Cheers
>> Jona
>>
>> Tom Lane wrote:
>>
>>> Jona <jonanews@oismail.com> <mailto:jonanews@oismail.com> writes:
>>>
>>>
>>>> I'm currently experiencing problems with long query execution times.
>>>> What I believe makes these problems particularly interesting is the
>>>> difference in execution plans between our test server running
>>>> PostGreSQL 7.3.6 and our production server running PostGreSQL 7.3.9.
>>>> The test server is an upgraded "home machine", a Pentium 4 with 1GB
>>>> of memory and IDE disk.
>>>> The production server is a dual CPU XEON Pentium 4 with 2GB memory
>>>> and SCSI disks.
>>>> One should expect the production server to be faster, but
>>>> appearently not as the outlined query plans below shows.
>>>>
>>>
>>> I think the plans are fine; it looks to me like the production server
>>> has serious table-bloat or index-bloat problems, probably because of
>>> inadequate vacuuming.  For instance compare these entries:
>>>
>>> ->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01
>>> rows=1 width=4) (actual time=0.05..0.31 rows=39 loops=4)
>>>      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND
>>> (statcon_tbl.ctpid = 1))
>>>
>>> ->  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40
>>> rows=5 width=4) (actual time=27.97..171.84 rows=39 loops=4)
>>>      Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND
>>> (statcon_tbl.ctpid = 1))
>>>
>>> Appears to be exactly the same task ... but the test server spent
>>> 1.24 msec total while the production server spent 687.36 msec total.
>>> That's more than half of your problem right there.  Some of the other
>>> scans seem a lot slower on the production machine too.
>>>
>>>
>>>
>>>> 1) How come the query plans between the 2 servers are different?
>>>>
>>>
>>> The production server's rowcount estimates are pretty good, the test
>>> server's are not.  How long since you vacuumed/analyzed the test
>>> server?
>>>
>>> It'd be interesting to see the output of "vacuum verbose statcon_tbl"
>>> on both servers ...
>>>
>>>             regards, tom lane
>>>
>>> PS: if you post any more query plans, please try to use software that
>>> doesn't mangle the formatting so horribly ...
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 4: Don't 'kill -9' the postmaster
>>>
>>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Tom Lane

Date:

06 May 2005, 10:59:53

Jona <jonanews@oismail.com> writes:
> Test Server:
> comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
> INFO:  --Relation public.statcon_tbl--
> INFO:  Pages 338: Changed 0, Empty 0; Tup 11494: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.02s/0.00u sec elapsed 1.98 sec.
> INFO:  --Relation pg_toast.pg_toast_179851--
> INFO:  Pages 85680: Changed 0, Empty 0; Tup 343321: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 1.75s/0.23u sec elapsed 30.36 sec.
> INFO:  Analyzing public.statcon_tbl
> VACUUM

> Live Server:
> comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
> INFO:  --Relation public.statcon_tbl--
> INFO:  Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0, UnUsed 6101.
>         Total CPU 0.00s/0.01u sec elapsed 0.01 sec.
> INFO:  --Relation pg_toast.pg_toast_891830--
> INFO:  Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0,
> UnUsed 5487.
>         Total CPU 3.21s/0.47u sec elapsed 18.03 sec.
> INFO:  Analyzing public.statcon_tbl
> VACUUM

Hm, the physical table sizes aren't very different, which suggests that
the problem must lie with the indexes.  Unfortunately, VACUUM in 7.3
doesn't tell you anything about indexes if it doesn't have any dead rows
to clean up.  Could you look at pg_class.relpages for all the indexes
of this table, and see what that shows?

            regards, tom lane

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Jona

Date:

07 May 2005, 08:56:42

Wouldn't the VACUUM have made them equivalent??

anyway, here's the info for relpages:
Live Server: 424
Test Server: 338

Please note though that there're more rows on the live server than on the test server due to recent upload.
Total Row counts are as follows:
Live Server: 12597
Test Server: 11494

When the problems started the tables had identical size though.

Cheers
Jona

Tom Lane wrote:

Jona <jonanews@oismail.com> writes:

Test Server:
comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
INFO:  --Relation public.statcon_tbl--
INFO:  Pages 338: Changed 0, Empty 0; Tup 11494: Vac 0, Keep 0, UnUsed 0.       Total CPU 0.02s/0.00u sec elapsed 1.98 sec.
INFO:  --Relation pg_toast.pg_toast_179851--
INFO:  Pages 85680: Changed 0, Empty 0; Tup 343321: Vac 0, Keep 0, UnUsed 0.       Total CPU 1.75s/0.23u sec elapsed 30.36 sec.
INFO:  Analyzing public.statcon_tbl
VACUUM

Live Server:
comm=# VACUUM ANALYZE VERBOSE StatCon_Tbl;
INFO:  --Relation public.statcon_tbl--
INFO:  Pages 424: Changed 0, Empty 0; Tup 12291: Vac 0, Keep 0, UnUsed 6101.       Total CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  --Relation pg_toast.pg_toast_891830--
INFO:  Pages 89234: Changed 0, Empty 0; Tup 352823: Vac 0, Keep 0, 
UnUsed 5487.       Total CPU 3.21s/0.47u sec elapsed 18.03 sec.
INFO:  Analyzing public.statcon_tbl
VACUUM

Hm, the physical table sizes aren't very different, which suggests that
the problem must lie with the indexes.  Unfortunately, VACUUM in 7.3
doesn't tell you anything about indexes if it doesn't have any dead rows
to clean up.  Could you look at pg_class.relpages for all the indexes
of this table, and see what that shows?
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
              http://archives.postgresql.org

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Tom Lane

Date:

07 May 2005, 15:04:21

Jona <jonanews@oismail.com> writes:
> anyway, here's the info for relpages:
> Live Server: 424
> Test Server: 338

I was asking about the indexes associated with the table, not the table
itself.

            regards, tom lane

Re: Bad choice of query plan from PG 7.3.6 to PG 7.3.9

From

Jona

Date:

08 May 2005, 08:06:51

Sorry Tom, misread your mail! My bad :-(

I believe the following is the data you need ?

Live Server
relname	relpages
ctp_statcon	72
statcon_pk	135

Test Server
relname	relpages
ctp_statcon	34
statcon_pk	28

Have executed the following query to obtain that data:
SELECT relname, relpages
FROM pg_class
WHERE relname = 'statcon_pk' OR relname = 'sc2ctp_fk' OR relname = 'sc2mtp_fk' OR relname = 'sc2sc_fk' OR relname = 'ctp_statcon'

The size difference for the index is surprisingly big I think, considering that there's only around 1000 rows more in the table on the live server than on the server.
Count for Live Server: 12597
Count for Test Server: 11494
Any insight into this?

Cheers
Jona

PS: The meta data for the table is:
CREATE TABLE statcon_tbl
(
id serial NOT NULL,
data bytea,
wm bool DEFAULT 'FALSE',
created timestamp DEFAULT now(),
modified timestamp DEFAULT now(),
enabled bool DEFAULT 'TRUE',
bitsperpixel int4 DEFAULT 0,
mtpid int4,
sctid int4,
ctpid int4,
CONSTRAINT statcon_pk PRIMARY KEY (id),
CONSTRAINT sc2ctp_fk FOREIGN KEY (ctpid) REFERENCES contype_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT sc2mtp_fk FOREIGN KEY (mtpid) REFERENCES mimetype_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT sc2sct_fk FOREIGN KEY (sctid) REFERENCES statcontrans_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE
)
WITHOUT OIDS;
CREATE INDEX ctp_statcon ON statcon_tbl USING btree (sctid, ctpid);

Tom Lane wrote:

Jona <jonanews@oismail.com> writes:

anyway, here's the info for relpages:
Live Server: 424
Test Server: 338

I was asking about the indexes associated with the table, not the table
itself.
		regards, tom lane