Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr - Mailing list pgsql-general

From john gale
Subject Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Date
Msg-id 42B101B3-980F-438D-AFF4-D7CDE4ABF757@smadness.com
Whole thread Raw
In response to Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Shaun Thomas <sthomas@optionshouse.com>)
Responses Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Shaun Thomas <sthomas@optionshouse.com>)
Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
On Aug 5, 2014, at 1:26 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:

> On 08/05/2014 03:06 PM, john gale wrote:
>
>> Even on a 114G table with a 16G index, you would consider this slow?
>>  (physical disk space is closer to 800G, that was our high-water before
>> removing lots of rows and vacuuming, although it is running on SSD)
>
> Yes, actually, Especially now that you've told me you're on an SSD. While that's way over my comfort level since it
complicatesmaintenance, table size shouldn't significantly impact index seek performance. What really matters is the
numberof rows matched, since that directly corresponds to how much work the database needs to do. 
>
>>  Hash Join  (cost=292303.24..348662.93 rows=28273 width=1466) (actual
>> time=23875.727..824373.654 rows=335032 loops=1)
>
> Well, the estimated versus actual are off by an order of magnitude. Instead of 28k rows, it actually got 335k.
However,in this case you'd want a hash or merge join, so that isn't your primary issue. I'd be more worried if we saw
thesenumbers in a nested loop, but that isn't the case here. 
>
> Still, what is your default_statistics_target set to?


Most of the planner options haven't diverged from default, so default_statistics_target is still set to 100.  I'm
vaguelyunderstanding the docs on that variable, but if we have the space it sounds like we should bump this up
significantlyto accommodate more statistics? 


>>                ->  Bitmap Index Scan on
>> index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170
>> width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
>
> Ouch, ouch, and more ouch. Your index_testruns_on_custom_spawnid index on matched 2.5 million rows...


Yes, although it's still better than the 40mil rows that we have in the table itself...

Also, that doesn't make sense to me, since we don't have 2.5mil rows that match this one SpawnID.  Could this suggest
thatmy partial hstore index is somehow misconstructed?  Or is that saying that 2.5mil rows have a SpawnID, not all of
whichwill be the one I'm looking for? 

    "index_testruns_on_custom_spawnid" btree ((custom_data -> 'SpawnID'::text))


>>                ->  Bitmap Index Scan on
>> index_testruns_on_started_at_2  (cost=0.00..250112.08 rows=3188736
>> width=0) (actual time=18679.875..18679.875 rows=5822899 loops=1)
>
> And the index_testruns_on_started_at_2 index matched 5.8 million. Even the estimate guessed 3.1 million here, so at
leastit's the right scale. Still, this is a *ton* of data to verify with random seeks, even on an SSD. 
>
>> Time: 824473.429 ms
>
> That's a time I would expect from wading through so much data. I'm suspicious of your queries and index selection.
Areyou regularly pulling and processing 300k records from a 100GB+ table? Since the results are about an order of
magnitudeless than the index matches, you need to really look at reducing your amount of matches before your run time
willdecrease by any significant amount. 


Not necessarily 300k records, but yes we essentially have one 100G+ table that we pull results to analyze from, whether
it's30 records or 3k.  300k seems like an abnormality but that's probably why I jumped on this one particular query in
thefirst place. 


> I'm not entirely familiar with the most efficient manner to index hstore data, but someone here can pipe in. The
statsand run times all look legitimate to me. The question is, can you reduce the number of matches through better
indexcorrelation, or more specific where clauses? The amount of data you're matching will necessarily require a lot of
processingtime. 


Interestingly, make the select clause *less* specific by removing the started_at part has sped this up quite a bit (if
I'mreading the explain correctly, which I don't really have a history of doing): 


db=# explain analyze SELECT *, testruns.id FROM testruns JOIN test_types ON testruns.test_type_id=test_types.id WHERE
testruns.custom_data->'SpawnID'= 'SpawnID-428870395.258592'; 
                                                                               QUERY PLAN
                                                

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=739.49..313967.74 rows=319935 width=1466) (actual time=7.668..178409.506 rows=348672 loops=1)
   Hash Cond: (testruns.test_type_id = test_types.id)
   ->  Index Scan using index_testruns_on_custom_spawnid on testruns  (cost=0.57..306430.20 rows=319935 width=1399)
(actualtime=0.030..177067.785 rows=348672 loops=1) 
         Index Cond: ((custom_data -> 'SpawnID'::text) = 'SpawnID-428870395.258592'::text)
   ->  Hash  (cost=563.41..563.41 rows=14041 width=67) (actual time=7.624..7.624 rows=14217 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 1380kB
         ->  Seq Scan on test_types  (cost=0.00..563.41 rows=14041 width=67) (actual time=0.003..2.547 rows=14217
loops=1)
 Total runtime: 178487.343 ms
(8 rows)

Time: 178488.516 ms


In this case the outer cost is 178k vs the previous 824k, although the estimate for both seems roughly the same.

I've also been thinking about creating partial date indexes since that's one of the few easy buckets to shove things
into,which seems like it would cut down on the searchable index size. 

    ~ john


pgsql-general by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr
Next
From: Shaun Thomas
Date:
Subject: Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr