Thread: Postgres 8.4 planner question - bad plan, good plan for almost same queries.
Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit under Ubuntu 11.10. Got large table with geography (PostGIS 1.5) data, running two almost same queries with same result, but very different performance. First query: with uuu as ( select dml.id from mp_locs12 dml where (complex conditions, leaving about 100 rows from millions) ) select label, country, region, parish, city, district, st_geometrytype(loc::geometry) as gtype, '0x' || to_hex(type) as n_type, file_name, line from mp_locs12 dml1 wheredml1.id in (select uu.id from uuu uu) and not exists ( select 1 from mp_locs12 dml2 wheredml2.id in (select uu.id from uuu uu) and dml2.id <> dml1.id and not st_contains(dml1.loc::geometry,dml2.loc::geometry) ); Planner choose to seqscan dml2 table in NOT EXISTS condition, very surprising for me - "dml2.id in (select uu.id from uuu uu)" is a best cut off here I believe, but planner thinks different (for dml1 it does not). QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ NestedLoop Anti Join (cost=2451523.67..8486747.31 rows=1 width=2130) Join Filter: ((dml2.id <> dml1.id) AND ((NOT ((dml1.loc)::geometry && (dml2.loc)::geometry)) OR (NOT _st_contains((dml1.loc)::geometry, (dml2.loc)::geometry)))) CTE uuu -> Seq Scan on mp_locs12 dml (cost=0.00..2451523.62 rows=1 width=4) Filter:(complex conditions, leaving about 100 rows from millions) -> Nested Loop (cost=0.02..8.85 rows=1 width=2134) -> HashAggregate (cost=0.02..0.03 rows=1 width=4) -> CTE Scan on uuu uu (cost=0.00..0.02 rows=1 width=4) -> Index Scan using mp_locs12_pkeyon mp_locs12 dml1 (cost=0.00..8.81 rows=1 width=2134) Index Cond: (dml1.id = uu.id) -> Seq Scan on mp_locs12 dml2 (cost=0.02..1750366.67rows=15581266 width=1946) Filter: (hashed SubPlan 2) SubPlan 2 -> CTE Scan on uuu uu (cost=0.00..0.02 rows=1width=4) (14 rows) Now try to use bit modified query: explain with uuu as ( select dml.id from mp_locs12 dml where (complex conditions, leaving about 100 rows frommillions) ) select label, country, region, parish, city, district, st_geometrytype(loc::geometry) as gtype, '0x' || to_hex(type) as n_type, file_name, line from mp_locs12 dml1 wheredml1.id in (select uu.id from uuu uu) and lower(st_geometrytype(dml1.loc::geometry)) not in ('st_geometrycollection') and not exists ( select 1 from ( select dml2.id as id from mp_locs12dml2 where dml2.id in (select uu.id from uuu uu) and not st_contains(dml1.loc::geometry, dml2.loc::geometry) and lower(st_geometrytype(dml2.loc::geometry)) not in ('st_geometrycollection') ) vv where vv.id <> dml1.id ); Only thing I changed - scanned CTE for ids to get, then compared taken ids with dml1.id. And now planner chose best plan (I think so): QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NestedLoop (cost=2139898.32..2139916.30 rows=1 width=2130) CTE uuu -> Seq Scan on mp_locs12 dml (cost=0.00..2139898.30rows=1 width=4) Filter: (complex conditions, leaving about 100 rows from millions) -> HashAggregate (cost=0.02..0.03 rows=1 width=4) -> CTE Scan on uuu uu (cost=0.00..0.02 rows=1 width=4) -> Index Scan using mp_locs12_pkey on mp_locs12 dml1 (cost=0.00..17.95 rows=1 width=2134) Index Cond: (dml1.id = uu.id) Filter: ((lower(st_geometrytype((dml1.loc)::geometry))<> 'st_geometrycollection'::text) AND (NOT (SubPlan 2))) SubPlan 2 -> Nested Loop (cost=0.02..9.13 rows=1width=0) -> HashAggregate (cost=0.02..0.03 rows=1 width=4) -> CTE Scan onuuu uu (cost=0.00..0.02 rows=1 width=4) -> Index Scan using mp_locs12_pkey on mp_locs12 dml2 (cost=0.00..9.08 rows=1 width=4) Index Cond: (dml2.id = uu.id) Filter: ((dml2.id <> $2) AND (lower(st_geometrytype((dml2.loc)::geometry)) <> 'st_geometrycollection'::text) AND ((NOT (($1)::geometry && (dml2.loc)::geometry)) OR (NOT _st_contains(($1)::geometry, (dml2.loc)::geometry)))) (16 rows) What makes planner o choose so differently in almost same cases? Is it a bug, or I misunderstood something? Regards Dmitry
Re: Postgres 8.4 planner question - bad plan, good plan for almost same queries.
From
Tom Lane
Date:
Дмитрий <fozzy@ac-sw.com> writes: > Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC > gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit > under Ubuntu 11.10. > with uuu as ( > select dml.id > from mp_locs12 dml > where (complex conditions, leaving about 100 rows from millions) > ) > select label, country, region, parish, city, district, > st_geometrytype(loc::geometry) as gtype, > '0x' || to_hex(type) as n_type, file_name, line > from mp_locs12 dml1 > where dml1.id in (select uu.id from uuu uu) > and not exists ( > select 1 from mp_locs12 dml2 > where dml2.id in (select uu.id from uuu uu) > and dml2.id <> dml1.id > and not st_contains(dml1.loc::geometry, dml2.loc::geometry) > ); I think the reason the planner isn't too bright about this case is http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3 ie, it won't do IN/EXISTS pullup below a NOT EXISTS. HEAD is better, thanks to commit 0816fad6eebddb8f1f0e21635e46625815d690b9, but of course there is no chance at all of back-patching the planner changes that depends on. regards, tom lane
> I think the reason the planner isn't too bright about this case is > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3 > ie, it won't do IN/EXISTS pullup below a NOT EXISTS. > > HEAD is better, thanks to commit > 0816fad6eebddb8f1f0e21635e46625815d690b9, but of course there is no > chance at all of back-patching the planner changes that depends on. I found that it works fine for [NOT] EXISTS if I just fold query inside into "select 1 from (...) vv". With my provided query, it uses seqscan for both EXISTS/NOT EXISTS without folding, and index scan with it. Okay, it's easier for me to make automatic subquery folding. Regards, Dmitry