Thread: Postgres 8.4 planner question - bad plan, good plan for almost same queries.

Postgres 8.4 planner question - bad plan, good plan for almost same queries.

From
Дмитрий
Date:
Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC 
gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit
under Ubuntu 11.10.

Got large table with geography (PostGIS 1.5) data, running two almost 
same queries with same result, but very different performance.

First query:

with uuu as (    select dml.id      from mp_locs12 dml    where (complex conditions, leaving about 100 rows from
millions) )  select label, country, region, parish, city, district, 
 
st_geometrytype(loc::geometry) as gtype,    '0x' || to_hex(type) as n_type, file_name, line    from mp_locs12 dml1
wheredml1.id in (select uu.id from uuu uu)        and not exists (          select 1 from mp_locs12 dml2
wheredml2.id in (select uu.id from uuu uu)              and dml2.id <> dml1.id              and not
st_contains(dml1.loc::geometry,dml2.loc::geometry)        );
 

Planner choose to seqscan dml2 table in NOT EXISTS condition, very 
surprising for me - "dml2.id in (select uu.id from uuu uu)" is a best 
cut off here I believe, but planner thinks different (for dml1 it does not).

                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NestedLoop Anti Join  (cost=2451523.67..8486747.31 rows=1 width=2130)   Join Filter: ((dml2.id <> dml1.id) AND ((NOT
((dml1.loc)::geometry
 
&& (dml2.loc)::geometry)) OR (NOT _st_contains((dml1.loc)::geometry, 
(dml2.loc)::geometry))))   CTE uuu     ->  Seq Scan on mp_locs12 dml  (cost=0.00..2451523.62 rows=1 width=4)
Filter:(complex conditions, leaving about 100 rows from 
 
millions)   ->  Nested Loop  (cost=0.02..8.85 rows=1 width=2134)         ->  HashAggregate  (cost=0.02..0.03 rows=1
width=4)              ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1 width=4)         ->  Index Scan using
mp_locs12_pkeyon mp_locs12 dml1 
 
(cost=0.00..8.81 rows=1 width=2134)               Index Cond: (dml1.id = uu.id)   ->  Seq Scan on mp_locs12 dml2
(cost=0.02..1750366.67rows=15581266 
 
width=1946)         Filter: (hashed SubPlan 2)         SubPlan 2           ->  CTE Scan on uuu uu  (cost=0.00..0.02
rows=1width=4)
 
(14 rows)


Now try to use bit modified query:

explain with uuu as (    select dml.id      from mp_locs12 dml    where (complex conditions, leaving about 100 rows
frommillions)  )  select label, country, region, parish, city, district, 
 
st_geometrytype(loc::geometry) as gtype,    '0x' || to_hex(type) as n_type, file_name, line    from mp_locs12 dml1
wheredml1.id in (select uu.id from uuu uu)        and lower(st_geometrytype(dml1.loc::geometry)) not in 
 
('st_geometrycollection')        and not exists (          select 1 from (              select dml2.id as id from
mp_locs12dml2                where dml2.id in (select uu.id from uuu uu)                  and not
st_contains(dml1.loc::geometry,
 
dml2.loc::geometry)                  and lower(st_geometrytype(dml2.loc::geometry)) not in 
('st_geometrycollection')            ) vv            where vv.id <> dml1.id        );

Only thing I changed - scanned CTE for ids to get, then compared taken 
ids with dml1.id. And now planner chose best plan (I think so):

                                                 QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NestedLoop  (cost=2139898.32..2139916.30 rows=1 width=2130)   CTE uuu     ->  Seq Scan on mp_locs12 dml
(cost=0.00..2139898.30rows=1 width=4)           Filter: (complex conditions, leaving about 100 rows from 
 
millions)   ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)         ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1
width=4)  ->  Index Scan using mp_locs12_pkey on mp_locs12 dml1 
 
(cost=0.00..17.95 rows=1 width=2134)         Index Cond: (dml1.id = uu.id)         Filter:
((lower(st_geometrytype((dml1.loc)::geometry))<> 
 
'st_geometrycollection'::text) AND (NOT (SubPlan 2)))         SubPlan 2           ->  Nested Loop  (cost=0.02..9.13
rows=1width=0)                 ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)                       ->  CTE Scan
onuuu uu  (cost=0.00..0.02 rows=1 
 
width=4)                 ->  Index Scan using mp_locs12_pkey on mp_locs12 dml2  (cost=0.00..9.08 rows=1 width=4)
              Index Cond: (dml2.id = uu.id)                       Filter: ((dml2.id <> $2) AND 
 
(lower(st_geometrytype((dml2.loc)::geometry)) <> 
'st_geometrycollection'::text) AND ((NOT (($1)::geometry && 
(dml2.loc)::geometry)) OR (NOT _st_contains(($1)::geometry, 
(dml2.loc)::geometry))))
(16 rows)



What makes planner o choose so differently in almost same cases? Is it a 
bug, or I misunderstood something?

Regards
Dmitry


Дмитрий <fozzy@ac-sw.com> writes:
> Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC 
> gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit
> under Ubuntu 11.10.

> with uuu as (
>      select dml.id
>        from mp_locs12 dml
>      where (complex conditions, leaving about 100 rows from millions)
>    )
>    select label, country, region, parish, city, district, 
> st_geometrytype(loc::geometry) as gtype,
>      '0x' || to_hex(type) as n_type, file_name, line
>      from mp_locs12 dml1
>      where dml1.id in (select uu.id from uuu uu)
>          and not exists (
>            select 1 from mp_locs12 dml2
>              where dml2.id in (select uu.id from uuu uu)
>                and dml2.id <> dml1.id
>                and not st_contains(dml1.loc::geometry, dml2.loc::geometry)
>          );

I think the reason the planner isn't too bright about this case is
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3
ie, it won't do IN/EXISTS pullup below a NOT EXISTS.

HEAD is better, thanks to commit
0816fad6eebddb8f1f0e21635e46625815d690b9, but of course there is no
chance at all of back-patching the planner changes that depends on.
        regards, tom lane


> I think the reason the planner isn't too bright about this case is
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3
> ie, it won't do IN/EXISTS pullup below a NOT EXISTS.
>
> HEAD is better, thanks to commit
> 0816fad6eebddb8f1f0e21635e46625815d690b9, but of course there is no
> chance at all of back-patching the planner changes that depends on.

I found that it works fine for [NOT] EXISTS if I just fold query inside 
into "select 1 from (...) vv". With my provided query, it uses seqscan 
for both EXISTS/NOT EXISTS without folding, and index scan with it.

Okay, it's easier for me to make automatic subquery folding.

Regards,
Dmitry