understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr - Mailing list pgsql-general

From john gale
Subject understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Date
Msg-id B3F6AE48-E10E-4A91-A450-241B12B6312A@smadness.com
Whole thread Raw
Responses Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Shaun Thomas <sthomas@optionshouse.com>)
Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Marc Mamin <M.Mamin@intershop.de>)
List pgsql-general
I would be appreciative if somebody could help explain why we have two nearly identical queries taking two different
plannerroutes;  one a nested index loop that takes about 5s to complete, and the other a hash join & heap scan that
takesabout 2hr.  This is using Postgres 9.3.3 on OS X 10.9.4. 

These two queries are different only in the value of an hstore entry and a date entry:

SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 10:49:55.338828'; 
SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592'; 

We have built an index for the hstore entry for SpawnID and most queries use it correctly.

      Column      |            Type             |                       Modifiers
------------------+-----------------------------+-------------------------------------------------------
 id               | integer                     | not null default nextval('testruns_id_seq'::regclass)
 started_at       | timestamp without time zone | not null
...
Indexes:
...
    "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))

However after realizing that these two queries differed in execution time so drastically, an explain showed that they
aretaking two very different paths: 

db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 10:49:55.338828'; 
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)
   ->  Index Scan using index_testruns_on_custom_spawnid on testruns  (cost=0.57..2378.40 rows=88 width=1399)
         Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428842195.338828'::text)
         Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without time zone)
   ->  Index Scan using test_types_pkey on test_types  (cost=0.29..3.60 rows=1 width=67)
         Index Cond: (id = testruns.test_type_id)
(6 rows)

Time: 22.363 ms
db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592'; 
                                                                                 QUERY PLAN
                                                    

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Bitmap Heap Scan on testruns  (cost=291510.31..347269.21 rows=28273 width=1399)
         Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text) AND (started_at >
'2014-08-0318:39:55.258592'::timestamp without time zone)) 
         ->  BitmapAnd  (cost=291510.31..291510.31 rows=28273 width=0)
               ->  Bitmap Index Scan on index_testruns_on_custom_spawnid  (cost=0.00..41383.84 rows=500170 width=0)
                     Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
               ->  Bitmap Index Scan on index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736 width=0)
                     Index Cond: (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67)
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67)
(11 rows)

Time: 1.231 ms


Mainly, I don't understand why two queries constructed the same but with subtle value differences would take two
differentpaths through the analyzer.  We haven't fiddled with the default planner options, aside from random_page_cost
andeffective_cache_size to match our hardware.  Re-analyzing the table does not seem to change the cost.  The index
sizeis somewhat large (16G for a 114G table), and I haven't tried rebuilding it yet. 

thanks,

    ~ john


pgsql-general by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: free RAM not being used for page cache
Next
From: Shaun Thomas
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr