Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr - Mailing list pgsql-general

From john gale
Subject Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Date
Msg-id BC933461-711A-4546-9CCE-B9FFDDA8874F@smadness.com
Whole thread Raw
In response to Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Shaun Thomas <sthomas@optionshouse.com>)
Responses Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-general

On Aug 5, 2014, at 12:45 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:

Your EXPLAIN output basically answered this for you. Your fast query has this:

 Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)

While your slow one has this:

 Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)

If this data is at all accurate (run an EXPLAIN ANALYZE to get the real numbers), the second query will return about 300x more rows than the first one. If your first query takes 5s, 20 minutes for the second isn't beyond the realm of possibility. The planner changed approaches because an efficient technique to join 88 rows is much different than what's needed for 300 times as many.

Given that, five seconds for 88 rows is astonishingly slow, even for hstore. I'd be willing to bet something else is going on here.


Even on a 114G table with a 16G index, you would consider this slow?  (physical disk space is closer to 800G, that was our high-water before removing lots of rows and vacuuming, although it is running on SSD)

Re-running the query always makes it much faster, but these are generally just run once, and the duration log says the fast queries usually take between 2s - 5s the first time.  (I'm assuming that's because we have a very large effective_cache_size...?)

The explain analyze finished and suggests the initial row count was correct:

db=# explain analyze SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592';
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=292303.24..348662.93 rows=28273 width=1466) (actual time=23875.727..824373.654 rows=335032 loops=1)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Bitmap Heap Scan on testruns  (cost=291564.31..347323.21 rows=28273 width=1399) (actual time=23868.077..823271.058 rows=335032 loops=1)
         Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text) AND (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone))
         Rows Removed by Index Recheck: 1867456
         ->  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual time=23843.870..23843.870 rows=0 loops=1)
               ->  Bitmap Index Scan on index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170 width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
                     Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
               ->  Bitmap Index Scan on index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736 width=0) (actual time=18679.875..18679.875 rows=5822899 loops=1)
                     Index Cond: (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67) (actual time=7.635..7.635 rows=14215 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1380kB
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67) (actual time=0.004..2.639 rows=14215 loops=1)
 Total runtime: 824471.902 ms
(14 rows)

Time: 824473.429 ms

~ john

pgsql-general by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Next
From: Shaun Thomas
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr