Postgres 8.4 planner question - bad plan, good plan for almost same queries. - Mailing list pgsql-hackers

From Дмитрий
Subject Postgres 8.4 planner question - bad plan, good plan for almost same queries.
Date
Msg-id 4F683BFC.5030400@ac-sw.com
Whole thread Raw
Responses Re: Postgres 8.4 planner question - bad plan, good plan for almost same queries.
List pgsql-hackers
Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC 
gcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit
under Ubuntu 11.10.

Got large table with geography (PostGIS 1.5) data, running two almost 
same queries with same result, but very different performance.

First query:

with uuu as (    select dml.id      from mp_locs12 dml    where (complex conditions, leaving about 100 rows from
millions) )  select label, country, region, parish, city, district, 
 
st_geometrytype(loc::geometry) as gtype,    '0x' || to_hex(type) as n_type, file_name, line    from mp_locs12 dml1
wheredml1.id in (select uu.id from uuu uu)        and not exists (          select 1 from mp_locs12 dml2
wheredml2.id in (select uu.id from uuu uu)              and dml2.id <> dml1.id              and not
st_contains(dml1.loc::geometry,dml2.loc::geometry)        );
 

Planner choose to seqscan dml2 table in NOT EXISTS condition, very 
surprising for me - "dml2.id in (select uu.id from uuu uu)" is a best 
cut off here I believe, but planner thinks different (for dml1 it does not).

                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NestedLoop Anti Join  (cost=2451523.67..8486747.31 rows=1 width=2130)   Join Filter: ((dml2.id <> dml1.id) AND ((NOT
((dml1.loc)::geometry
 
&& (dml2.loc)::geometry)) OR (NOT _st_contains((dml1.loc)::geometry, 
(dml2.loc)::geometry))))   CTE uuu     ->  Seq Scan on mp_locs12 dml  (cost=0.00..2451523.62 rows=1 width=4)
Filter:(complex conditions, leaving about 100 rows from 
 
millions)   ->  Nested Loop  (cost=0.02..8.85 rows=1 width=2134)         ->  HashAggregate  (cost=0.02..0.03 rows=1
width=4)              ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1 width=4)         ->  Index Scan using
mp_locs12_pkeyon mp_locs12 dml1 
 
(cost=0.00..8.81 rows=1 width=2134)               Index Cond: (dml1.id = uu.id)   ->  Seq Scan on mp_locs12 dml2
(cost=0.02..1750366.67rows=15581266 
 
width=1946)         Filter: (hashed SubPlan 2)         SubPlan 2           ->  CTE Scan on uuu uu  (cost=0.00..0.02
rows=1width=4)
 
(14 rows)


Now try to use bit modified query:

explain with uuu as (    select dml.id      from mp_locs12 dml    where (complex conditions, leaving about 100 rows
frommillions)  )  select label, country, region, parish, city, district, 
 
st_geometrytype(loc::geometry) as gtype,    '0x' || to_hex(type) as n_type, file_name, line    from mp_locs12 dml1
wheredml1.id in (select uu.id from uuu uu)        and lower(st_geometrytype(dml1.loc::geometry)) not in 
 
('st_geometrycollection')        and not exists (          select 1 from (              select dml2.id as id from
mp_locs12dml2                where dml2.id in (select uu.id from uuu uu)                  and not
st_contains(dml1.loc::geometry,
 
dml2.loc::geometry)                  and lower(st_geometrytype(dml2.loc::geometry)) not in 
('st_geometrycollection')            ) vv            where vv.id <> dml1.id        );

Only thing I changed - scanned CTE for ids to get, then compared taken 
ids with dml1.id. And now planner chose best plan (I think so):

                                                 QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NestedLoop  (cost=2139898.32..2139916.30 rows=1 width=2130)   CTE uuu     ->  Seq Scan on mp_locs12 dml
(cost=0.00..2139898.30rows=1 width=4)           Filter: (complex conditions, leaving about 100 rows from 
 
millions)   ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)         ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1
width=4)  ->  Index Scan using mp_locs12_pkey on mp_locs12 dml1 
 
(cost=0.00..17.95 rows=1 width=2134)         Index Cond: (dml1.id = uu.id)         Filter:
((lower(st_geometrytype((dml1.loc)::geometry))<> 
 
'st_geometrycollection'::text) AND (NOT (SubPlan 2)))         SubPlan 2           ->  Nested Loop  (cost=0.02..9.13
rows=1width=0)                 ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)                       ->  CTE Scan
onuuu uu  (cost=0.00..0.02 rows=1 
 
width=4)                 ->  Index Scan using mp_locs12_pkey on mp_locs12 dml2  (cost=0.00..9.08 rows=1 width=4)
              Index Cond: (dml2.id = uu.id)                       Filter: ((dml2.id <> $2) AND 
 
(lower(st_geometrytype((dml2.loc)::geometry)) <> 
'st_geometrycollection'::text) AND ((NOT (($1)::geometry && 
(dml2.loc)::geometry)) OR (NOT _st_contains(($1)::geometry, 
(dml2.loc)::geometry))))
(16 rows)



What makes planner o choose so differently in almost same cases? Is it a 
bug, or I misunderstood something?

Regards
Dmitry


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [PATCH] Support for foreign keys with arrays
Next
From: Daniel Farina
Date:
Subject: Re: pg_terminate_backend for same-role