Re: Ye olde slow query - Mailing list pgsql-performance

From Murphy, Kevin
Subject Re: Ye olde slow query
Date
Msg-id DF23153C-3DFD-4A76-B9A2-346E6B127752@email.chop.edu
Whole thread Raw
In response to Re: Ye olde slow query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Sorry for the delay; back on this, and thanks for the response.

On Mar 11, 2014, at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Murphy, Kevin" <MURPHYKE@email.chop.edu> writes:
>> Synopsis: 8-table join with one "WHERE foo IN (...)" condition; works OK with fewer
>> than 5 items in the IN list, but at N=5, the planner starts using a compound index
>> for the first time that completely kills performance (5-6 minutes versus 0-12 seconds).
>> […]
>
> FWIW, I think the right question here is not "why is the slow query
> slow?", but "why is the fast query fast?”.
> […]
> It looks like the trouble spot is this intermediate nested loop:
>
>>                    ->  Nested Loop  (cost=4.32..283545.98 rows=80929 width=12) (actual time=163.609..571.237
rows=102loops=1) 
>>                          Buffers: shared hit=419 read=63
>>                          ->  Nested Loop  (cost=4.32..3426.09 rows=471 width=4) (actual time=93.595..112.404 rows=85
loops=1)
>>                              ...
>>                          ->  Index Scan using sample_result_variant_id on sample_result  (cost=0.00..593.01 rows=172
width=8)(actual time=5.147..5.397 rows=1 loops=85) 
>>                                Index Cond: (variant_id = variant_effect.variant_id)
>>                                Buffers: shared hit=400 read=42
>
> which is creating the bulk of the estimated cost for the whole plan,
> but execution is actually pretty cheap.  There seem to be two components
> to the misestimation: one is that the sub-nested loop is producing about a
> fifth as many rows as expected,

This may be because 3 out of the 4 user-supplied gene symbols were not present in the gene table at all.  Restricting
tovalid genes prior to the query is probably a good idea. 

> and the other is that the probes into
> sample_result are producing (on average) 1 row, not the 172 rows the
> planner expects.  If you could get the latter estimate to be even within
> one order of magnitude of reality, the planner would certainly see this
> plan as way cheaper than the other.

I’m not sure about how to improve this.  The stats were 5K globally and up to date, and I made them better, with no
change. I tried increasing the stats on the foreign keys involved to 10K (and analyzing), but the same costs and plan
arein play.  I know the stats are updated now because I dumped and restored on new hardware and did a vacuum analyze.
Ipreviously mentioned that some of the vanilla n_distinct values were way off for the (790M row) sample_result table,
soI have taken to coercing n_distinct using negative multipliers.  This data doesn’t change very often (it hasn’t in
manyweeks). 

There are 6M variants, but only 7.5% of them map to the sample_result table.  Presumably the planner knows this because
ofthe n_distinct value on sample_result.variant_id?  Each variant maps to zero or sample_result records, but often very
few,and never more than the number of samples (currently 1129). 

>
> So I'm wondering if the stats on sample_result and variant_effect are up
> to date.  If they are, you might try increasing the stats targets for the
> variant_id columns.

The stats were up to date and were at 5K globally.  I tried increasing the stats on the foreign keys involved to 10K
(andanalyzing!), but the same costs and plan are in play.  I know the stats are updated now because I dumped and
restoredon new hardware and did a vacuum analyze.  I previously mentioned that some of the vanilla n_distinct values
wereway off for the (790M row) sample_result table, so I have taken to coercing n_distinct using negative multipliers. 

Regards,
Kevin



pgsql-performance by date:

Previous
From: Ryan Johnson
Date:
Subject: Re: SSI slows down over time
Next
From: Kevin Grittner
Date:
Subject: Re: SSI slows down over time