Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr - Mailing list pgsql-general

From Marc Mamin
Subject Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Date
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828AC4799@jenmbs01.ad.intershop.net
Whole thread Raw
In response to understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (john gale <john@smadness.com>)
List pgsql-general
Hello,

> "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))
> ..
> WHERE testruns.custom_data->'SpawnID' = 'SpawnID-428842195.338828'
> ...

If all your SpawnID have this prefix, you may consider remove it from your index to reduce its size:


=> "index_testruns_on_custom_spawnid" btree ((substring(custom_data -> 'SpawnID'::text) , 9))
you must of course then use exactly this clause within your query:

=>  WHERE substring(testruns.custom_data->'SpawnID' , 9) = '428842195.338828'

And as it now looks like a number, maybe go a step further depending on which patterns these SpawnID can have.
This given value can for example be stored as 2 int4 or one int8: ('428842195.338828'::numeric * 1000000)::int8


On the other hand this will only address a limited part of your performance issue.

regards,

Marc Mamin


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of john gale
> Sent: Dienstag, 5. August 2014 21:17
> To: pgsql-general@postgresql.org general
> Subject: [GENERAL] understanding why two nearly identical queries take two
> different planner routes, one 5s and one 2hr
>
>
> I would be appreciative if somebody could help explain why we have two nearly
> identical queries taking two different planner routes;  one a nested index
> loop that takes about 5s to complete, and the other a hash join & heap scan
> that takes about 2hr.  This is using Postgres 9.3.3 on OS X 10.9.4.
>
> These two queries are different only in the value of an hstore entry and a
> date entry:
>
> SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
> 10:49:55.338828'; SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
> 18:39:55.258592';
>
> We have built an index for the hstore entry for SpawnID and most queries use
> it correctly.
>
>       Column      |            Type             |
> Modifiers
> ------------------+-----------------------------+-----------------------
> ------------------+-----------------------------+-----------------------
> ------------------+-----------------------------+---------
>  id               | integer                     | not null default
> nextval('testruns_id_seq'::regclass)
>  started_at       | timestamp without time zone | not null
> ...
> Indexes:
> ...
>     "index_testruns_on_custom_spawnid" btree ((custom_data ->
> 'SpawnID'::text))
>
> However after realizing that these two queries differed in execution time so
> drastically, an explain showed that they are taking two very different paths:
>
> db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
> 10:49:55.338828';
>                                                   QUERY PLAN
> ------------------------------------------------------------------------------
> --------------------------------
>  Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)
>    ->  Index Scan using index_testruns_on_custom_spawnid on testruns
> (cost=0.57..2378.40 rows=88 width=1399)
>          Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428842195.338828'::text)
>          Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without
> time zone)
>    ->  Index Scan using test_types_pkey on test_types  (cost=0.29..3.60 rows=1
> width=67)
>          Index Cond: (id = testruns.test_type_id)
> (6 rows)
>
> Time: 22.363 ms
> db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
> 18:39:55.258592';
>
> QUERY PLAN
> ------------------------------------------------------------------------------
> ------------------------------------------------------------------------------
> ----------------
>  Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)
>    Hash Cond: (testruns.test_type_id = test_types.id)
>    ->  Bitmap Heap Scan on testruns  (cost=291510.31..347269.21 rows=28273
> width=1399)
>          Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428870395.258592'::text) AND (started_at > '2014-08-03
> 18:39:55.258592'::timestamp without time zone))
>          ->  BitmapAnd  (cost=291510.31..291510.31 rows=28273 width=0)
>                ->  Bitmap Index Scan on index_testruns_on_custom_spawnid
> (cost=0.00..41383.84 rows=500170 width=0)
>                      Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428870395.258592'::text)
>                ->  Bitmap Index Scan on index_testruns_on_started_at_2
> (cost=0.00..250112.08 rows=3188736 width=0)
>                      Index Cond: (started_at > '2014-08-03
> 18:39:55.258592'::timestamp without time zone)
>    ->  Hash  (cost=563.41..563.41 rows=14041 width=67)
>          ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67)
> (11 rows)
>
> Time: 1.231 ms
>
>
> Mainly, I don't understand why two queries constructed the same but with
> subtle value differences would take two different paths through the analyzer.
> We haven't fiddled with the default planner options, aside from
> random_page_cost and effective_cache_size to match our hardware.  Re-analyzing
> the table does not seem to change the cost.  The index size is somewhat large
> (16G for a 114G table), and I haven't tried rebuilding it yet.
>
> thanks,
>
>     ~ john
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


pgsql-general by date:

Previous
From: David G Johnston
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Next
From: M Tarkeshwar Rao
Date:
Subject: Adding 3 hours while inserting data into table