Re: Planning performance problem (67626.278ms) - Mailing list pgsql-performance

From Jeremy Schneider
Subject Re: Planning performance problem (67626.278ms)
Date
Msg-id b145b428-3351-ec86-d522-100a5bf8ded7@amazon.com
Whole thread Raw
In response to Re: Planning performance problem (67626.278ms)  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Planning performance problem (67626.278ms)
List pgsql-performance
On 4/8/19 07:42, Justin Pryzby wrote:
> On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
>> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <plocharz@9livesdata.com> napsal:
>>
>>> We have some very strange query planning problem. Long story short it
>>> takes 67626.278ms just to plan. Query execution takes 12ms.
>>>
>>> Query has 7 joins and 2 subselects.
>>> It looks like the issue is not deterministic, sometimes is takes few ms
>>> to plan the query.
>>>
>>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
>>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
>>>
>>> Question is how can we debug what is going on?
>>
>> please check your indexes against bloating. Planner get min and max from
>> indexes and this operation is slow on bloat indexes.
> 
> I think that's from get_actual_variable_range(), right ?

For what it's worth, I have seen a similar issue on Aurora PG 9.6 where
query planning took a very long time (multiple minutes). In this
particular case, there wasn't anything Aurora-specific about the call to
get_actual_variable_range. We weren't able to distinctly identify the
root cause or build a reproducible test case -- but we suspect that an
inefficiency might exist in community PostgreSQL code.

For debugging, a few ideas:

1) capture a stack with pstack or perf record --call-graph

2) capture the execution plan of the SQL w slow planning

3) capture detailed stats for all relations and objects involved

4) capture the usual info for bug reporting (preface section in docs)

A reproducible test case is the gold standard; I'm keeping my eyes open
for another case too.

For the slow planning case that I saw, the slow process was almost
entirely in this call stack (captured with perf record --call-graph):
...
index_fetch_heap
index_getnext
get_actual_variable_range
ineq_histogram_selectivity
scalarineqsel
mergejoinscansel
initial_cost_mergejoin
try_mergejoin_path
add_paths_to_joinrel
make_join_rel
join_search_one_level
standard_join_search
make_one_rel
query_planner
...

-Jeremy


-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



pgsql-performance by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Oracle to postgres migration
Next
From: Andres Freund
Date:
Subject: Re: Planning performance problem (67626.278ms)