Re: Planning performance problem (67626.278ms) - Mailing list pgsql-performance

From Andres Freund
Subject Re: Planning performance problem (67626.278ms)
Date
Msg-id 20190408232644.r6xixkeulsmomwfj@alap3.anarazel.de
Whole thread Raw
In response to Re: Planning performance problem (67626.278ms)  (Jeremy Schneider <schnjere@amazon.com>)
Responses Re: Planning performance problem (67626.278ms)  (Jeremy Schneider <schnjere@amazon.com>)
List pgsql-performance
Hi,

On 2019-04-08 16:10:17 -0700, Jeremy Schneider wrote:
> On 4/8/19 07:42, Justin Pryzby wrote:
> > On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
> >> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <plocharz@9livesdata.com> napsal:
> >>
> >>> We have some very strange query planning problem. Long story short it
> >>> takes 67626.278ms just to plan. Query execution takes 12ms.
> >>>
> >>> Query has 7 joins and 2 subselects.
> >>> It looks like the issue is not deterministic, sometimes is takes few ms
> >>> to plan the query.
> >>>
> >>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
> >>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
> >>>
> >>> Question is how can we debug what is going on?
> >>
> >> please check your indexes against bloating. Planner get min and max from
> >> indexes and this operation is slow on bloat indexes.
> > 
> > I think that's from get_actual_variable_range(), right ?
> 
> For what it's worth, I have seen a similar issue on Aurora PG 9.6 where
> query planning took a very long time (multiple minutes). In this
> particular case, there wasn't anything Aurora-specific about the call to
> get_actual_variable_range. We weren't able to distinctly identify the
> root cause or build a reproducible test case -- but we suspect that an
> inefficiency might exist in community PostgreSQL code.
> 
> For debugging, a few ideas:
> 
> 1) capture a stack with pstack or perf record --call-graph
> 
> 2) capture the execution plan of the SQL w slow planning
> 
> 3) capture detailed stats for all relations and objects involved
> 
> 4) capture the usual info for bug reporting (preface section in docs)
> 
> A reproducible test case is the gold standard; I'm keeping my eyes open
> for another case too.
> 
> For the slow planning case that I saw, the slow process was almost
> entirely in this call stack (captured with perf record --call-graph):
> ...
> index_fetch_heap
> index_getnext
> get_actual_variable_range
> ineq_histogram_selectivity
> scalarineqsel
> mergejoinscansel
> initial_cost_mergejoin
> try_mergejoin_path
> add_paths_to_joinrel
> make_join_rel
> join_search_one_level
> standard_join_search
> make_one_rel
> query_planner
> ...

I suspect some of this might be related to < 11 not having the following
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd

Greetings,

Andres Freund



pgsql-performance by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: Planning performance problem (67626.278ms)
Next
From: Duncan Kinnear
Date:
Subject: Re: Commit(?) overhead