Re: Postgresql planning time too high - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Postgresql planning time too high
Date
Msg-id 20191122193602.ljivcdonmmww3nm3@development
Whole thread Raw
In response to Re[2]: Postgresql planning time too high  ("Sterpu Victor" <victor@caido.ro>)
List pgsql-performance
On Fri, Nov 22, 2019 at 11:44:51AM +0000, Sterpu Victor wrote:
>No rows should be returned, DB is empty.
>I'm testing now on a empty DB trying to find out how to improve this.
>

I'm a bit puzzled why you're doinf tests on an empty database, when in
production it'll certainly contain data. I guess you're assuming that
this way you isolate planning time, which should remain about the same
even with data loaded, but I'm not entirely sure that's true - all this
planning is done with no statistics (histograms, MCV lists, ...) and
maybe it's forcing the planner to do more work? I wouldn't be surprised
if having those stats would allow the planner to take some shortcuts,
cutting the plannnig time down.

Not to mention that we don't know if the plan is actually any good, for
all what we know it might take 10 years on real data, making the
planning duration irrelevant.


Let's put that aside, though. Let's assume it's because of expensive
join order planning. I don't think you have a lot of options, here,
unfortunately.

One option is to try reducing the planner options that determine how
much effort should be spent on join planning, e.g. join_collapse_limit
and geqo_threshold. If this is the root cause, you might even rewrite
the query to use optimal join order and set join_collapse_limit=1.
You'll have to play with it.

The other option is using CTEs with materialization, with the same
effect, i.e. prevention of optimization across CTEs, reducing the
total effort.

>In this query I have 3 joins like this:
>
>SELECT t1.id, t2.valid_from
>FROM t1
>JOIN t2 ON (t1.id_t1 = t1.id)
>LEFT JOIN t3 ON (t3.id_t1 = t1.id AND t3.valid_from<t2.valid_from)
>WHERE t3.id IS NULL
>
>If I delete these 3 joins than the planning time goes down from 5.482 
>ms to 754.708 ms but I'm not sure why this context is so demanding on 
>the planner.
>I'm tryng now to make a materialized view that will allow me to stop 
>using the syntax above.
>
>I reattached the same files, they should be fine like this.
>

It'd be useful to have something others can use to reproduce the issue,
and investigate locally. SQL script that creates the whole schema and
runs the query, for example.

What I'd like to see is a perf profile from the planning, so that we can
see where exactly is the bottleneck. Maybe there actually is a bug that
makes it muych more expensive than it should be, in some corner case?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-performance by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Hash Join over Nested Loop
Next
From: Andrew Zakharov
Date:
Subject: Re: performance degredation after upgrade from 9.6 to 12