BUG #13827: planner chooses more expensive plan than it should - Mailing list pgsql-bugs

From michal.schwarz@gmail.com
Subject BUG #13827: planner chooses more expensive plan than it should
Date
Msg-id 20151218143324.14453.55010@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #13827: planner chooses more expensive plan than it should  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: BUG #13827: planner chooses more expensive plan than it should  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13827
Logged by:          Michal Schwarz
Email address:      michal.schwarz@gmail.com
PostgreSQL version: 9.2.14
Operating system:   CentOS release 6.5 (Final), x86_64
Description:

Correct behaviour:

=> explain select 1 FROM nakupy_prodeje n, smlouvy s WHERE
n.smlouva_id=s.smlouva_id AND s.osoba_id='900316';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..18415.76 rows=88 width=0)
   ->  Index Scan using smlouvy_osoba_id on smlouvy s  (cost=0.00..678.38
rows=170 width=4)
         Index Cond: (osoba_id = 900316)
   ->  Index Only Scan using nakupy_prodeje_smlouva_id on nakupy_prodeje n
(cost=0.00..100.99 rows=335 width=4)
         Index Cond: (smlouva_id = s.smlouva_id)
(5 rows)

Total expected cost is 18415 and 88 rows. OK.

Problematic behaviour is when I just add "AND n.datum_realizace is null" to
original WHERE condition. This query should be at least as fast as previous
query, because everything is absolutely the same, and only an ADDITIONAL
condition "AND n.datum_realizace is null" was used.

But PostgreSQL chooses much expensive query plan this time, even when it
could just use plan similar to previous with only additional filtering
applied to original 88 expected rows:

=> explain select 1 FROM nakupy_prodeje n, smlouvy s WHERE
n.smlouva_id=s.smlouva_id AND s.osoba_id='900316' AND n.datum_realizace is
null;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=680.51..42187.98 rows=3 width=0)
   Hash Cond: (n.smlouva_id = s.smlouva_id)
   ->  Index Scan using nakupy_prodeje_datum_realizace on nakupy_prodeje n
(cost=0.00..40380.37 rows=300551 width=4)
         Index Cond: (datum_realizace IS NULL)
   ->  Hash  (cost=678.38..678.38 rows=170 width=4)
         ->  Index Scan using smlouvy_osoba_id on smlouvy s
(cost=0.00..678.38 rows=170 width=4)
               Index Cond: (osoba_id = 900316)
(7 rows)

Why did PG choose a plan with cost=41287, when it could use much simpler
plan based on previous example?

Or, more specifically, why is PG using Index Cond for "datum_realizace IS
NULL" with 300000 resulting expected rows, when it could apparently use
simpler plan.

For me, it is like a difference between optimal:
  "how to count all people in room?" => "let's look at them ... yes, there
are 5 people here"
and extremely sub-optimal:
  "how to count all people WITH BEARD in room?" => "let's count ALL THE
PEOPLE IN THE TOWN (there will be approx. 300000 of them), then find out all
WITH BEARD, and finally look whether any of them is in this room".
:-)

Table structures are:

=> \d nakupy_prodeje
                                                    Table
"public.nakupy_prodeje"
          Column           |            Type             |
              Modifiers

---------------------------+-----------------------------+----------------------------------------------------------------------------
 prikaz_id                 | integer                     | not null default
nextval(('nakupy_prodeje_prikaz_id_seq'::text)::regclass)
 smlouva_id                | integer                     | not null
 datum_realizace           | timestamp without time zone |
Indexes:
    "nakupy_prodeje_pkey" PRIMARY KEY, btree (prikaz_id)
    "nakupy_prodeje_datum_realizace" btree (datum_realizace)
    "nakupy_prodeje_smlouva_id" btree (smlouva_id)

=> \d smlouvy
                                                           Table
"public.smlouvy"
                Column                 |            Type             |
                       Modifiers

---------------------------------------+-----------------------------+----------------------------------------------------------------------
 smlouva_id                            | integer                     | not
null default nextval(('smlouvy_smlouva_id_seq'::text)::regclass)
 osoba_id                              | integer                     | not
null
Indexes:
    "smlouvy_pkey" PRIMARY KEY, btree (smlouva_id)
    "smlouvy_osoba_id" btree (osoba_id)


And ANALYZE was run on these tables.

pgsql-bugs by date:

Previous
From: George Wesington
Date:
Subject: Re: BUG #13823: Database crash - import requests in plpython3u
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #13827: planner chooses more expensive plan than it should