Thread: Optimizer Not using the Right plan
Hi, Postgres 8.2.4 is not using the right plan for different values. From the below queries listing.addressvaluation table has 19million records , the other table listing.valuationchangeperiod is just lookup table with 3 records. If you can see the explain plans for the statements the first one uses a bad plan for 737987 addressid search, does a index scan backward on the primary key "addressvaluationid" takes more time to execute and the same query for a different addressid (5851202) uses the correct optimal plan with index scan on "addressid" column which is way quicker. Autovacuums usually vacuums these tables regularly, in fact I checked the pg_stat_user_tables the last vacuum/analyze on this table was last night. I did another manual vacuum analyze on the listing.addrevaluation table it uses the right plan for all the values now. Can anyone explain me this wierd behavior ? why does it have different plans for different values and after doing manual vacuum analyze it works properly ? Are autovacuums not effective enough ? Here are my autovacuum settings autovacuum_naptime = 120min autovacuum_vacuum_threshold = 500 autovacuum_analyze_threshold = 250 autovacuum_vacuum_scale_factor = 0.001 autovacuum_analyze_scale_factor = 0.001 autovacuum_freeze_max_age = 200000000 autovacuum_vacuum_cost_delay = -1 autovacuum_vacuum_cost_limit = -1 Here are the table structures listing.addressvaluation Table "listing.addressvaluation" Column | Type | Modifiers ----------------------------+-----------------------------+--------------------------------------------------------------------------- addressvaluationid | integer | not null default nextval(('listing.addressvaluationseq'::text)::regclass) fkaddressid | integer | not null fkaddressvaluationsourceid | integer | not null sourcereference | text | createdate | timestamp without time zone | not null default ('now'::text)::timestamp(6) without time zone valuationdate | timestamp without time zone | not null valuationamount | numeric(14,2) | valuationhigh | numeric(14,2) | valuationlow | numeric(14,2) | valuationconfidence | integer | valuationchange | numeric(14,2) | fkvaluationchangeperiodid | integer | historycharturl | text | regionhistorycharturl | text | Indexes: "pk_addressvaluation_addressvaluationid" PRIMARY KEY, btree (addressvaluationid), tablespace "indexdata" "idx_addressvaluation_createdate" btree (createdate), tablespace "indexdata" "idx_addressvaluation_fkaddressid" btree (fkaddressid), tablespace "indexdata" "idx_addressvaluation_fkaddressid2" btree (fkaddressid), tablespace "indexdata" Foreign-key constraints: "fk_addressvaluation_address" FOREIGN KEY (fkaddressid) REFERENCES listing.address(addressid) "fk_addressvaluation_addressvaluationsource" FOREIGN KEY (fkaddressvaluationsourceid) REFERENCES listing.addressvaluationsource(addressvaluationsourceid) "fk_addressvaluation_valuationchangeperiod" FOREIGN KEY (fkvaluationchangeperiodid) REFERENCES listing.valuationchangeperiod(valuationchangeperiodid) listing.valuationchangeperiod Table "listing.valuationchangeperiod" Column | Type | Modifiers -------------------------+---------+-------------------------------------------------------------------------------- valuationchangeperiodid | integer | not null default nextval(('listing.valuationchangeperiodseq'::text)::regclass) name | text | not null Indexes: "pk_valuationchangeperiod_valuationchangeperiodid" PRIMARY KEY, btree (valuationchangeperiodid), tablespace "indexdata" "uq_valuationchangeperiod_name" UNIQUE, btree (name), tablespace "indexdata" For Addressid 737987 after autovacuum before manual vacuum analyze ------------------------------------------------------------------------------------------- explain select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_, valuationc2_.name as name197_0_ from listing.addressvaluation this_ left outer join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid where this_.fkaddressid=737987 order by this_.addressvaluationid desc limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..678.21 rows=1 width=494) -> Nested Loop Left Join (cost=0.00..883026.09 rows=1302 width=494) -> Index Scan Backward using pk_addressvaluation_addressvaluationid on addressvaluation this_ (cost=0.00..882649.43 rows=1302 width=482) Filter: (fkaddressid = 737987) -> Index Scan using pk_valuationchangeperiod_valuationchangeperiodid on valuationchangeperiod valuationc2_ (cost=0.00..0.28 rows=1 width=12) Index Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid) (6 rows) For Addressid 5851202 after autovacuum before manual vacuum analyze -------------------------------------------------------------------------------------------- select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_, valuationc2_.name as name197_0_ from listing.addressvaluation this_ left outer join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid where this_.fkaddressid=5851202 order by this_.addressvaluationid desc limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=30.68..30.68 rows=1 width=494) -> Sort (cost=30.68..30.71 rows=11 width=494) Sort Key: this_.addressvaluationid -> Hash Left Join (cost=1.07..30.49 rows=11 width=494) Hash Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid) -> Index Scan using idx_addressvaluation_fkaddressid2 on addressvaluation this_ (cost=0.00..29.27 rows=11 width=482) Index Cond: (fkaddressid = 5851202) -> Hash (cost=1.03..1.03 rows=3 width=12) -> Seq Scan on valuationchangeperiod valuationc2_ (cost=0.00..1.03 rows=3 width=12) (9 rows) After manual vacuum analyze for addressid 737987 ------------------------------------------------------------------ explain select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_, valuationc2_.name as name197_0_ from listing.addressvaluation this_ inner join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid where this_.fkaddressid=737987 order by this_.addressvaluationid desc limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.24..31.24 rows=1 width=494) -> Sort (cost=31.24..31.27 rows=11 width=494) Sort Key: this_.addressvaluationid -> Hash Join (cost=1.07..31.05 rows=11 width=494) Hash Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid) -> Index Scan using idx_addressvaluation_fkaddressid on addressvaluation this_ (cost=0.00..29.83 rows=11 width=482) Index Cond: (fkaddressid = 737987) -> Hash (cost=1.03..1.03 rows=3 width=12) -> Seq Scan on valuationchangeperiod valuationc2_ (cost=0.00..1.03 rows=3 width=12) (9 rows) Thanks! Pallav.
Pallav Kalva <pkalva@livedatagroup.com> writes: > why does it have different plans for different values Because the values occur different numbers of times (or so it thinks anyway). If the rowcount estimates are far from reality, perhaps increasing the statistics target would help. However, since you only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can really tell whether the optimizer did anything wrong here. regards, tom lane
Tom Lane wrote: > Pallav Kalva <pkalva@livedatagroup.com> writes: > >> why does it have different plans for different values >> > > Because the values occur different numbers of times (or so it thinks > anyway). If the rowcount estimates are far from reality, perhaps > increasing the statistics target would help. However, since you > only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can > really tell whether the optimizer did anything wrong here. > > regards, tom lane > Hi Tom, Thanks! for your reply, here is an another example of the same query with different addressid now. This time I got the explain analyze on the query, this query also uses the Index Scan Backwards, it says it took 28 seconds but I can say that after looking at the postgres logs it took more than 2 min when the query first ran. I ran this one again to get the explain analyze. The statistics set to "default_statistics_target = 100" I am sure if it uses index on addressid it would be quicker but for some reason it using index backward scan on addressvaluationid and that is taking too long. Not only this one there are some other queries which use index scan backwards scan and it takes too long. Index scan backwards most of the time is not doing good for me is there any way to avoid it ? explain analyze select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_, valuationc2_.name as name197_0_ from listing.addressvaluation this_ left outer join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid where this_.fkaddressid= 6664161 order by this_.addressvaluationid desc limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..677.69 rows=1 width=494) (actual time=28454.708..28454.712 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..883705.44 rows=1304 width=494) (actual time=28454.700..28454.700 rows=1 loops=1) -> Index Scan Backward using pk_addressvaluation_addressvaluationid on addressvaluation this_ (cost=0.00..883328.22 rows=1304 width=482) (actual time=28441.236..28441.236 rows=1 loops=1) Filter: (fkaddressid = 6664161) -> Index Scan using pk_valuationchangeperiod_valuationchangeperiodid on valuationchangeperiod valuationc2_ (cost=0.00..0.28 rows=1 width=12) (actual time=13.447..13.447 rows=1 loops=1) Index Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid) Total runtime: 28454.789 ms (7 rows)
I think you may increase the row number that you want to limit, like LIMIT 50.
LIMIT can change the cost of a plan dramatically. Looking in your SQL:
where this_.fkaddressid= 6664161
order by this_.addressvaluationid desc limit 1;
where this_.fkaddressid= 6664161
order by this_.addressvaluationid desc limit 1;
Planner may use either index1(this_.fkaddressid) or index2(this_.addressvaluationid) to scan the table. Although it is obvious that using index2 is very expensive, but because you are trying to limit one row from 1304 row, so the cost of using index2 changes to
883328/1304=677.69
The cost of using index1 should be lager than 1304, so planner chooses index2.
Planner tends to choose a plan which has small startup cost when you are trying to LIMIT a small portion of data over a large data set. It seems that the following issue also comes from the same root.
http://archives.postgresql.org/pgsql-performance/2007-11/msg00395.php
Best Regards
Galy Lee
Tom Lane wrote:Pallav Kalva <pkalva ( at ) livedatagroup ( dot ) com> writes:why does it have different plans for different valuesBecause the values occur different numbers of times (or so it thinks anyway). If the rowcount estimates are far from reality, perhaps increasing the statistics target would help. However, since you only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can really tell whether the optimizer did anything wrong here. regards, tom laneHi Tom,Thanks! for your reply, here is an another example of the same query with different addressid now. This time I got the explain analyze on the query, this query also uses the Index Scan Backwards, it says it took 28 seconds but I can say that after looking at the postgres logs it took more than 2 min when the query first ran. I ran this one again to get the explain analyze.The statistics set to "default_statistics_target = 100"I am sure if it uses index on addressid it would be quicker but for some reason it using index backward scan on addressvaluationid and that is taking too long. Not only this one there are some other queries which use index scan backwards scan and it takes too long. Index scan backwards most of the time is not doing good for me is there any way to avoid it ?explain analyzeselect this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_,valuationc2_.name as name197_0_from listing.addressvaluation this_ left outer join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodidwhere this_.fkaddressid= 6664161 order by this_.addressvaluationid desc limit 1;QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..677.69 rows=1 width=494) (actual time=28454.708..28454.712 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..883705.44 rows=1304 width=494) (actual time=28454.700..28454.700 rows=1 loops=1) -> Index Scan Backward using pk_addressvaluation_addressvaluationid on addressvaluation this_ (cost=0.00..883328.22 rows=1304 width=482) (actual time=28441.236..28441.236 rows=1 loops=1)Filter: (fkaddressid = 6664161)-> Index Scan using pk_valuationchangeperiod_valuationchangeperiodid on valuationchangeperiod valuationc2_ (cost=0.00..0.28 rows=1 width=12) (actual time=13.447..13.447 rows=1 loops=1) Index Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid)Total runtime: 28454.789 ms (7 rows)