Thread: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

I would be appreciative if somebody could help explain why we have two nearly identical queries taking two different
plannerroutes;  one a nested index loop that takes about 5s to complete, and the other a hash join & heap scan that
takesabout 2hr.  This is using Postgres 9.3.3 on OS X 10.9.4. 

These two queries are different only in the value of an hstore entry and a date entry:

SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 10:49:55.338828'; 
SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592'; 

We have built an index for the hstore entry for SpawnID and most queries use it correctly.

      Column      |            Type             |                       Modifiers
------------------+-----------------------------+-------------------------------------------------------
 id               | integer                     | not null default nextval('testruns_id_seq'::regclass)
 started_at       | timestamp without time zone | not null
...
Indexes:
...
    "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))

However after realizing that these two queries differed in execution time so drastically, an explain showed that they
aretaking two very different paths: 

db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03 10:49:55.338828'; 
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)
   ->  Index Scan using index_testruns_on_custom_spawnid on testruns  (cost=0.57..2378.40 rows=88 width=1399)
         Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428842195.338828'::text)
         Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without time zone)
   ->  Index Scan using test_types_pkey on test_types  (cost=0.29..3.60 rows=1 width=67)
         Index Cond: (id = testruns.test_type_id)
(6 rows)

Time: 22.363 ms
db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592'; 
                                                                                 QUERY PLAN
                                                    

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Bitmap Heap Scan on testruns  (cost=291510.31..347269.21 rows=28273 width=1399)
         Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text) AND (started_at >
'2014-08-0318:39:55.258592'::timestamp without time zone)) 
         ->  BitmapAnd  (cost=291510.31..291510.31 rows=28273 width=0)
               ->  Bitmap Index Scan on index_testruns_on_custom_spawnid  (cost=0.00..41383.84 rows=500170 width=0)
                     Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
               ->  Bitmap Index Scan on index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736 width=0)
                     Index Cond: (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67)
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67)
(11 rows)

Time: 1.231 ms


Mainly, I don't understand why two queries constructed the same but with subtle value differences would take two
differentpaths through the analyzer.  We haven't fiddled with the default planner options, aside from random_page_cost
andeffective_cache_size to match our hardware.  Re-analyzing the table does not seem to change the cost.  The index
sizeis somewhat large (16G for a 114G table), and I haven't tried rebuilding it yet. 

thanks,

    ~ john


On 08/05/2014 02:16 PM, john gale wrote:

Your EXPLAIN output basically answered this for you. Your fast query has
this:

>   Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)

While your slow one has this:

>   Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)

If this data is at all accurate (run an EXPLAIN ANALYZE to get the real
numbers), the second query will return about 300x more rows than the
first one. If your first query takes 5s, 20 minutes for the second isn't
beyond the realm of possibility. The planner changed approaches because
an efficient technique to join 88 rows is much different than what's
needed for 300 times as many.

Given that, five seconds for 88 rows is astonishingly slow, even for
hstore. I'd be willing to bet something else is going on here.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email



On Aug 5, 2014, at 12:45 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:

Your EXPLAIN output basically answered this for you. Your fast query has this:

 Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)

While your slow one has this:

 Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)

If this data is at all accurate (run an EXPLAIN ANALYZE to get the real numbers), the second query will return about 300x more rows than the first one. If your first query takes 5s, 20 minutes for the second isn't beyond the realm of possibility. The planner changed approaches because an efficient technique to join 88 rows is much different than what's needed for 300 times as many.

Given that, five seconds for 88 rows is astonishingly slow, even for hstore. I'd be willing to bet something else is going on here.


Even on a 114G table with a 16G index, you would consider this slow?  (physical disk space is closer to 800G, that was our high-water before removing lots of rows and vacuuming, although it is running on SSD)

Re-running the query always makes it much faster, but these are generally just run once, and the duration log says the fast queries usually take between 2s - 5s the first time.  (I'm assuming that's because we have a very large effective_cache_size...?)

The explain analyze finished and suggests the initial row count was correct:

db=# explain analyze SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' = 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03 18:39:55.258592';
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=292303.24..348662.93 rows=28273 width=1466) (actual time=23875.727..824373.654 rows=335032 loops=1)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Bitmap Heap Scan on testruns  (cost=291564.31..347323.21 rows=28273 width=1399) (actual time=23868.077..823271.058 rows=335032 loops=1)
         Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text) AND (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone))
         Rows Removed by Index Recheck: 1867456
         ->  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual time=23843.870..23843.870 rows=0 loops=1)
               ->  Bitmap Index Scan on index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170 width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
                     Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
               ->  Bitmap Index Scan on index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736 width=0) (actual time=18679.875..18679.875 rows=5822899 loops=1)
                     Index Cond: (started_at > '2014-08-03 18:39:55.258592'::timestamp without time zone)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67) (actual time=7.635..7.635 rows=14215 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1380kB
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67) (actual time=0.004..2.639 rows=14215 loops=1)
 Total runtime: 824471.902 ms
(14 rows)

Time: 824473.429 ms

~ john
On 08/05/2014 03:06 PM, john gale wrote:

> Even on a 114G table with a 16G index, you would consider this slow?
>   (physical disk space is closer to 800G, that was our high-water before
> removing lots of rows and vacuuming, although it is running on SSD)

Yes, actually, Especially now that you've told me you're on an SSD.
While that's way over my comfort level since it complicates maintenance,
table size shouldn't significantly impact index seek performance. What
really matters is the number of rows matched, since that directly
corresponds to how much work the database needs to do.

>   Hash Join  (cost=292303.24..348662.93 rows=28273 width=1466) (actual
> time=23875.727..824373.654 rows=335032 loops=1)

Well, the estimated versus actual are off by an order of magnitude.
Instead of 28k rows, it actually got 335k. However, in this case you'd
want a hash or merge join, so that isn't your primary issue. I'd be more
worried if we saw these numbers in a nested loop, but that isn't the
case here.

Still, what is your default_statistics_target set to?

>                 ->  Bitmap Index Scan on
> index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170
> width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)

Ouch, ouch, and more ouch. Your index_testruns_on_custom_spawnid index
on matched 2.5 million rows...

>                 ->  Bitmap Index Scan on
> index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736
> width=0) (actual time=18679.875..18679.875 rows=5822899 loops=1)

And the index_testruns_on_started_at_2 index matched 5.8 million. Even
the estimate guessed 3.1 million here, so at least it's the right scale.
Still, this is a *ton* of data to verify with random seeks, even on an SSD.

> Time: 824473.429 ms

That's a time I would expect from wading through so much data. I'm
suspicious of your queries and index selection. Are you regularly
pulling and processing 300k records from a 100GB+ table? Since the
results are about an order of magnitude less than the index matches, you
need to really look at reducing your amount of matches before your run
time will decrease by any significant amount.

I'm not entirely familiar with the most efficient manner to index hstore
data, but someone here can pipe in. The stats and run times all look
legitimate to me. The question is, can you reduce the number of matches
through better index correlation, or more specific where clauses? The
amount of data you're matching will necessarily require a lot of
processing time.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


On Aug 5, 2014, at 1:26 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:

> On 08/05/2014 03:06 PM, john gale wrote:
>
>> Even on a 114G table with a 16G index, you would consider this slow?
>>  (physical disk space is closer to 800G, that was our high-water before
>> removing lots of rows and vacuuming, although it is running on SSD)
>
> Yes, actually, Especially now that you've told me you're on an SSD. While that's way over my comfort level since it
complicatesmaintenance, table size shouldn't significantly impact index seek performance. What really matters is the
numberof rows matched, since that directly corresponds to how much work the database needs to do. 
>
>>  Hash Join  (cost=292303.24..348662.93 rows=28273 width=1466) (actual
>> time=23875.727..824373.654 rows=335032 loops=1)
>
> Well, the estimated versus actual are off by an order of magnitude. Instead of 28k rows, it actually got 335k.
However,in this case you'd want a hash or merge join, so that isn't your primary issue. I'd be more worried if we saw
thesenumbers in a nested loop, but that isn't the case here. 
>
> Still, what is your default_statistics_target set to?


Most of the planner options haven't diverged from default, so default_statistics_target is still set to 100.  I'm
vaguelyunderstanding the docs on that variable, but if we have the space it sounds like we should bump this up
significantlyto accommodate more statistics? 


>>                ->  Bitmap Index Scan on
>> index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170
>> width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
>
> Ouch, ouch, and more ouch. Your index_testruns_on_custom_spawnid index on matched 2.5 million rows...


Yes, although it's still better than the 40mil rows that we have in the table itself...

Also, that doesn't make sense to me, since we don't have 2.5mil rows that match this one SpawnID.  Could this suggest
thatmy partial hstore index is somehow misconstructed?  Or is that saying that 2.5mil rows have a SpawnID, not all of
whichwill be the one I'm looking for? 

    "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))


>>                ->  Bitmap Index Scan on
>> index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736
>> width=0) (actual time=18679.875..18679.875 rows=5822899 loops=1)
>
> And the index_testruns_on_started_at_2 index matched 5.8 million. Even the estimate guessed 3.1 million here, so at
leastit's the right scale. Still, this is a *ton* of data to verify with random seeks, even on an SSD. 
>
>> Time: 824473.429 ms
>
> That's a time I would expect from wading through so much data. I'm suspicious of your queries and index selection.
Areyou regularly pulling and processing 300k records from a 100GB+ table? Since the results are about an order of
magnitudeless than the index matches, you need to really look at reducing your amount of matches before your run time
willdecrease by any significant amount. 


Not necessarily 300k records, but yes we essentially have one 100G+ table that we pull results to analyze from, whether
it's30 records or 3k.  300k seems like an abnormality but that's probably why I jumped on this one particular query in
thefirst place. 


> I'm not entirely familiar with the most efficient manner to index hstore data, but someone here can pipe in. The
statsand run times all look legitimate to me. The question is, can you reduce the number of matches through better
indexcorrelation, or more specific where clauses? The amount of data you're matching will necessarily require a lot of
processingtime. 


Interestingly, make the select clause *less* specific by removing the started_at part has sped this up quite a bit (if
I'mreading the explain correctly, which I don't really have a history of doing): 


db=# explain analyze SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592'; 
                                                                               QUERY PLAN
                                                

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=739.49..313967.74 rows=319935 width=1466) (actual time=7.668..178409.506 rows=348672 loops=1)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Index Scan using index_testruns_on_custom_spawnid on testruns  (cost=0.57..306430.20 rows=319935 width=1399)
(actualtime=0.030..177067.785 rows=348672 loops=1) 
         Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67) (actual time=7.624..7.624 rows=14217 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1380kB
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67) (actual time=0.003..2.547 rows=14217
loops=1)
 Total runtime: 178487.343 ms
(8 rows)

Time: 178488.516 ms


In this case the outer cost is 178k vs the previous 824k, although the estimate for both seems roughly the same.

I've also been thinking about creating partial date indexes since that's one of the few easy buckets to shove things
into,which seems like it would cut down on the searchable index size. 

    ~ john


On 08/05/2014 04:08 PM, john gale wrote:

> Most of the planner options haven't diverged from default, so
> default_statistics_target is still set to 100.  I'm vaguely
> understanding the docs on that variable, but if we have the space it
> sounds like we should bump this up significantly to accommodate more
> statistics?

Yeah. You'll want to bump that up. However, I'm not sure how much of a
difference that'll make to the hstore stuff due to the blobby nature of
that type of data.

> Also, that doesn't make sense to me, since we don't have 2.5mil rows
> that match this one SpawnID.  Could this suggest that my partial
> hstore index is somehow misconstructed?  Or is that saying that
> 2.5mil rows have a SpawnID, not all of which will be the one I'm
> looking for?

I'm not sure. But according to that explain analyze, it matched 2.5
million... somethings. Perhaps it has to do with how hstore is actually
indexed. Maybe it starts at SpawnID, then narrows it down with
SpawnID-428870395.258592, but can't do so until it fetches the SpawnID
part. I'll stop commenting on hstore anything, because I never use it.

> Not necessarily 300k records, but yes we essentially have one 100G+
> table that we pull results to analyze from, whether it's 30 records
> or 3k.  300k seems like an abnormality but that's probably why I
> jumped on this one particular query in the first place.

Yeah. Like I said, it's all of that preliminary work that's inflating
your execution time. Something tells me that better stats might help you
out. Increase default_statistics_target to 300 or 400, analyze the
table, and try again. See what happens.

> Interestingly, make the select clause *less* specific by removing the
> started_at part has sped this up quite a bit (if I'm reading the
> explain correctly, which I don't really have a history of doing):

You might think that, but not really. :) Consider this:

> ->  Index Scan using index_testruns_on_custom_spawnid on testruns
> (cost=0.57..306430.20 rows=319935 width=1399) (actual
> time=0.030..177067.785 rows=348672 loops=1)

That's the only index scan it applied. By not using the much larger
index with far more hits (5-million for the previous "more specific"
query) it did less work. Something tells me that the
index_testruns_on_started_at_2 index is confusing the planner into doing
way more than it should.

You might be better off by using this query in a CTE (WITH syntax) and
then using a post-processing WHERE clause to filter to the proper time
range. Tricking the planner that way is kinda gross, but it works. This
is assuming your stat adjustment doesn't work.

> I've also been thinking about creating partial date indexes since
> that's one of the few easy buckets to shove things into, which seems
> like it would cut down on the searchable index size.

Properly applying partial indexes without overdoing it is a delicate
dance, but it can pay off in spades. Try it, it might work out. ;)

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Shaun Thomas-2 wrote
>> Also, that doesn't make sense to me, since we don't have 2.5mil rows
>> that match this one SpawnID.  Could this suggest that my partial
>> hstore index is somehow misconstructed?  Or is that saying that
>> 2.5mil rows have a SpawnID, not all of which will be the one I'm
>> looking for?
>
> I'm not sure. But according to that explain analyze, it matched 2.5
> million... somethings. Perhaps it has to do with how hstore is actually
> indexed. Maybe it starts at SpawnID, then narrows it down with
> SpawnID-428870395.258592, but can't do so until it fetches the SpawnID
> part. I'll stop commenting on hstore anything, because I never use it.

The presence of hstore in a red herring here.  The index itself is storing
the result of "hstore => 'key'" which has a data type of "text".  Note that
the use of a "btree" index reinforces this since (IIRC) a direct hstore
index would require gin/gist.

*Recommendation:* Create a multi-column index instead of (or in addition to)
the two single column ones.

Theorizing out loud here...take with a grain of salt - and experiment.

What is happening is that the entire (id) index is being bitmapped (hashed,
basically) so that it can be combined with the other (date) index (also
being bitmapped).  Since this process operates at the page level the bitmap
of an ID that is frequently used throughout time is going to have a very low
hit ratio (matched tuples / pages) and also result in a high recheck loss
rate.  The ID seems likely to have this problem - the Date index likely not
so much.

NOTE: I am confused by this line:
->  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual
time=23843.870..23843.870 rows=0 loops=1)

How did actual match zero rows?  It should be something like 2.2M

Anyway, you should probably experiment with creating a multi-column index
instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
timestamp will have higher cardinality and so should be listed first in the
index.  This is the only way it can combine two indexes and it thinks this
is better than using the "Date" index and then scanning every single page
for matching IDs.

Experimentation (or more knowledgeable insight) into this particular
decision is worthwhile...

I would probably still increase statistic targets but I do not think that
will help as much as a compound index.

Note that using a CTE (which is kinda like an explicit BitmapAnd) is not
likely to be any better than an implicit BitmapAnd - but it doesn't hurt to
check.

I'm not sure how a partial index would help here...though maybe
partitioning...especially since a partial index on date doesn't typically
make sense since time is constantly changing.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/understanding-why-two-nearly-identical-queries-take-two-different-planner-routes-one-5s-and-one-2hr-tp5813819p5813840.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Hello,

> "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))
> ..
> WHERE testruns.custom_data->'SpawnID' = 'SpawnID-428842195.338828'
> ...

If all your SpawnID have this prefix, you may consider remove it from your index to reduce its size:


=> "index_testruns_on_custom_spawnid" btree ((substring(custom_data -> 'SpawnID'::text) , 9))
you must of course then use exactly this clause within your query:

=>  WHERE substring(testruns.custom_data->'SpawnID' , 9) = '428842195.338828'

And as it now looks like a number, maybe go a step further depending on which patterns these SpawnID can have.
This given value can for example be stored as 2 int4 or one int8: ('428842195.338828'::numeric * 1000000)::int8


On the other hand this will only address a limited part of your performance issue.

regards,

Marc Mamin


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of john gale
> Sent: Dienstag, 5. August 2014 21:17
> To: pgsql-general@postgresql.org general
> Subject: [GENERAL] understanding why two nearly identical queries take two
> different planner routes, one 5s and one 2hr
>
>
> I would be appreciative if somebody could help explain why we have two nearly
> identical queries taking two different planner routes;  one a nested index
> loop that takes about 5s to complete, and the other a hash join & heap scan
> that takes about 2hr.  This is using Postgres 9.3.3 on OS X 10.9.4.
>
> These two queries are different only in the value of an hstore entry and a
> date entry:
>
> SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
> 10:49:55.338828'; SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
> 18:39:55.258592';
>
> We have built an index for the hstore entry for SpawnID and most queries use
> it correctly.
>
>       Column      |            Type             |
> Modifiers
> ------------------+-----------------------------+-----------------------
> ------------------+-----------------------------+-----------------------
> ------------------+-----------------------------+---------
>  id               | integer                     | not null default
> nextval('testruns_id_seq'::regclass)
>  started_at       | timestamp without time zone | not null
> ...
> Indexes:
> ...
>     "index_testruns_on_custom_spawnid" btree ((custom_data ->
> 'SpawnID'::text))
>
> However after realizing that these two queries differed in execution time so
> drastically, an explain showed that they are taking two very different paths:
>
> db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428842195.338828' AND testruns.started_at > '2014-08-03
> 10:49:55.338828';
>                                                   QUERY PLAN
> ------------------------------------------------------------------------------
> --------------------------------
>  Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)
>    ->  Index Scan using index_testruns_on_custom_spawnid on testruns
> (cost=0.57..2378.40 rows=88 width=1399)
>          Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428842195.338828'::text)
>          Filter: (started_at > '2014-08-03 10:49:55.338828'::timestamp without
> time zone)
>    ->  Index Scan using test_types_pkey on test_types  (cost=0.29..3.60 rows=1
> width=67)
>          Index Cond: (id = testruns.test_type_id)
> (6 rows)
>
> Time: 22.363 ms
> db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
> testruns.test_type_id=test_types.id WHERE testruns.custom_data->'SpawnID' =
> 'SpawnID-428870395.258592' AND testruns.started_at > '2014-08-03
> 18:39:55.258592';
>
> QUERY PLAN
> ------------------------------------------------------------------------------
> ------------------------------------------------------------------------------
> ----------------
>  Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)
>    Hash Cond: (testruns.test_type_id = test_types.id)
>    ->  Bitmap Heap Scan on testruns  (cost=291510.31..347269.21 rows=28273
> width=1399)
>          Recheck Cond: (((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428870395.258592'::text) AND (started_at > '2014-08-03
> 18:39:55.258592'::timestamp without time zone))
>          ->  BitmapAnd  (cost=291510.31..291510.31 rows=28273 width=0)
>                ->  Bitmap Index Scan on index_testruns_on_custom_spawnid
> (cost=0.00..41383.84 rows=500170 width=0)
>                      Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-
> 428870395.258592'::text)
>                ->  Bitmap Index Scan on index_testruns_on_started_at_2
> (cost=0.00..250112.08 rows=3188736 width=0)
>                      Index Cond: (started_at > '2014-08-03
> 18:39:55.258592'::timestamp without time zone)
>    ->  Hash  (cost=563.41..563.41 rows=14041 width=67)
>          ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67)
> (11 rows)
>
> Time: 1.231 ms
>
>
> Mainly, I don't understand why two queries constructed the same but with
> subtle value differences would take two different paths through the analyzer.
> We haven't fiddled with the default planner options, aside from
> random_page_cost and effective_cache_size to match our hardware.  Re-analyzing
> the table does not seem to change the cost.  The index size is somewhat large
> (16G for a 114G table), and I haven't tried rebuilding it yet.
>
> thanks,
>
>     ~ john
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


On Tue, Aug 5, 2014 at 2:08 PM, john gale <john@smadness.com> wrote:

>>                ->  Bitmap Index Scan on
>> index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170
>> width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
>
> Ouch, ouch, and more ouch. Your index_testruns_on_custom_spawnid index on matched 2.5 million rows...


Yes, although it's still better than the 40mil rows that we have in the table itself...

Also, that doesn't make sense to me, since we don't have 2.5mil rows that match this one SpawnID.  Could this suggest that my partial hstore index is somehow misconstructed?  Or is that saying that 2.5mil rows have a SpawnID, not all of which will be the one I'm looking for?

Have you tripled checked that for 'SpawnID-428870395.258592' ?

That seems like something a human is much more likely to get wrong than a computer is.

Anyway, it seems like an compound index on ((custom_data -> 'SpawnID'::text),started_at) could do wonders for this query.

Cheers,

Jeff 

On Tue, Aug 5, 2014 at 4:30 PM, David G Johnston <david.g.johnston@gmail.com> wrote:

NOTE: I am confused by this line:
->  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual
time=23843.870..23843.870 rows=0 loops=1)

How did actual match zero rows?  It should be something like 2.2M

The accounting for bitmap operations seems to be a bit of a mess.  In some cases, it reports the number of rows represented in the bitmap.  Sometimes it counts a bitmap itself as a row, and so there is just one of them no matter how many rows it represents.  In this case, it seems to consider a bitmap not to be a row at all.   The problem with counting the number of rows represented by the bitmap is that that value is unknown if either if the input bitmaps has gone lossy.



Anyway, you should probably experiment with creating a multi-column index
instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
timestamp will have higher cardinality and so should be listed first in the
index.  

No, the timestamp should almost certainly come second because it is used with inequality operators.

Cheers,

Jeff



On Wed, Aug 6, 2014 at 10:08 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Aug 5, 2014 at 4:30 PM, David G Johnston <david.g.johnston@gmail.com> wrote:

Anyway, you should probably experiment with creating a multi-column index
instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
timestamp will have higher cardinality and so should be listed first in the
index.  

No, the timestamp should almost certainly come second because it is used with inequality operators.


​Wouldn't that only matter if a typical inequality was expected to return more rows than a given equality on the other field?  Depending on the cardinality of the ID field I would expect a very large range of dates to be required before digging down into ID becomes more effective.  My instinct say there are relatively few IDs in play but that they are continually adding new rows.

What statistics would the OP have to provide in order to actually make a fact-based determination?

David J​.