Thread: Parameter for planner estimate of recursive queries

Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
I've been investigating the poor performance of a WITH RECURSIVE
query, which I've recreated with test data.

The first thing was to re-write the query, which helped improve
performance by about 30%, but the plan was still very bad. With a
small patch I've been able to improve performance by about x100.

The poor performance is traced to the planner cost estimates for
recursive queries. Specifically, the cost of the recursive arm of the
query is evaluated based upon both of these hardcoded assumptions:

1. The recursion will last for 10 loops
2. The average size of the worktable will be 10x the size of the
initial query (non-recursive term).

Taken together these assumptions lead to a very poor estimate of the
worktable activity (in this case), which leads to the plan changing as
a result.

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

I've written a short patch to make the estimate of the avg size of the
worktable configurable:

  recursive_worktable_estimate = N (default 10)

Using this parameter with the test query results in a consistently
repeatable ~100x gain in performance, using
recursive_worktable_estimate = 1 for a shortest path query:

Unpatched: 1775ms
Patched: 17.2ms

This is because the estimated size of the worktable is closer to the
truth and so leads naturally to a more sensible plan. EXPLAINs
attached - please look at the estimated rows for the WorkTable Scan.

There are various options for setting the two estimates: just one, or
other, or both values separately, or both together. Note that I
haven't touched the estimate that recursion will last for 10 loops. I
figured that people would object to two knobs.

Thoughts?


There are 2 other ways to speed up the query. One is to set
enable_seqscan = off, which helps about 20%, but may have other
consequences. Two is to set work_mem = 512MB, so that the poor plan
(hash) works as fast as possible, but that is far from optimal either.
Getting the right plan is x20 faster than either of those
alternatives.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> The poor performance is traced to the planner cost estimates for
> recursive queries. Specifically, the cost of the recursive arm of the
> query is evaluated based upon both of these hardcoded assumptions:
>
> 1. The recursion will last for 10 loops
> 2. The average size of the worktable will be 10x the size of the
> initial query (non-recursive term).
>
> Taken together these assumptions lead to a very poor estimate of the
> worktable activity (in this case), which leads to the plan changing as
> a result.
>
> The factor 10 is a reasonably safe assumption and helps avoid worst
> case behavior in bigger graph queries. However, the factor 10 is way
> too large for many types of graph query, such as where the path
> through the data is tight, and/or the query is written to prune bushy
> graphs, e.g. shortest path queries. The factor 10 should not be
> hardcoded in the planner, but should be settable, just as
> cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Parameter for planner estimate of recursive queries

From
Kenaniah Cerny
Date:
> The factor 10 should not be hardcoded in the planner, but should be settable, just as cursor_tuple_fraction is.

I feel considerably out of my depth here, but I like the idea of a working table size multiplier GUC, given the challenges of predicting the number of iterations (and any adjustments to cardinality per iteration). An exponential cost function may lead to increasingly pathological outcomes, especially when estimates for cte_rows are off. 

In the EXPLAINs, it looked like the estimates for knows_pkey were off by an order of magnitude or so. It's possible the planner would have chosen the Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg would have estimated to roughly the same size as the second plan anyways, even with rel->tuples = 10 * cte_rows).

I also wonder if there's a better default than cte_rows * 10, but implementing a new GUC sounds like a reasonable solution to this as well.

--

Kenaniah

On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> The poor performance is traced to the planner cost estimates for
> recursive queries. Specifically, the cost of the recursive arm of the
> query is evaluated based upon both of these hardcoded assumptions:
>
> 1. The recursion will last for 10 loops
> 2. The average size of the worktable will be 10x the size of the
> initial query (non-recursive term).
>
> Taken together these assumptions lead to a very poor estimate of the
> worktable activity (in this case), which leads to the plan changing as
> a result.
>
> The factor 10 is a reasonably safe assumption and helps avoid worst
> case behavior in bigger graph queries. However, the factor 10 is way
> too large for many types of graph query, such as where the path
> through the data is tight, and/or the query is written to prune bushy
> graphs, e.g. shortest path queries. The factor 10 should not be
> hardcoded in the planner, but should be settable, just as
> cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

--
Simon Riggs                http://www.EnterpriseDB.com/




Re: Parameter for planner estimate of recursive queries

From
Peter Eisentraut
Date:
On 31.12.21 15:10, Simon Riggs wrote:
>> The factor 10 is a reasonably safe assumption and helps avoid worst
>> case behavior in bigger graph queries. However, the factor 10 is way
>> too large for many types of graph query, such as where the path
>> through the data is tight, and/or the query is written to prune bushy
>> graphs, e.g. shortest path queries. The factor 10 should not be
>> hardcoded in the planner, but should be settable, just as
>> cursor_tuple_fraction is.
> If you think this should be derived without parameters, then we would
> want a function that starts at 1 for 1 input row and gets much larger
> for larger input. The thinking here is that Graph OLTP is often a
> shortest path between two nodes, whereas Graph Analytics and so the
> worktable will get much bigger.

On the one hand, this smells like a planner hint.  But on the other 
hand, it doesn't look like we will come up with proper graph-aware 
selectivity estimation system any time soon, so just having all graph 
OLTP queries suck until then because the planner hint is hardcoded 
doesn't seem like a better solution.  So I think this setting can be ok. 
  I think the way you have characterized it makes sense, too: for graph 
OLAP, you want a larger value, for graph OLTP, you want a smaller value.




Re: Parameter for planner estimate of recursive queries

From
Hamid Akhtar
Date:


On Tue, 25 Jan 2022 at 14:44, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 31.12.21 15:10, Simon Riggs wrote:
>> The factor 10 is a reasonably safe assumption and helps avoid worst
>> case behavior in bigger graph queries. However, the factor 10 is way
>> too large for many types of graph query, such as where the path
>> through the data is tight, and/or the query is written to prune bushy
>> graphs, e.g. shortest path queries. The factor 10 should not be
>> hardcoded in the planner, but should be settable, just as
>> cursor_tuple_fraction is.
> If you think this should be derived without parameters, then we would
> want a function that starts at 1 for 1 input row and gets much larger
> for larger input. The thinking here is that Graph OLTP is often a
> shortest path between two nodes, whereas Graph Analytics and so the
> worktable will get much bigger.

On the one hand, this smells like a planner hint.  But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution.  So I think this setting can be ok.
  I think the way you have characterized it makes sense, too: for graph
OLAP, you want a larger value, for graph OLTP, you want a smaller value.

Do you think there is a case to replace the 10x multiplier with "recursive_worktable_estimate" for total_rows calculation in the cost_recursive_union function too?
 

Re: Parameter for planner estimate of recursive queries

From
Robert Haas
Date:
On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> On the one hand, this smells like a planner hint.  But on the other
> hand, it doesn't look like we will come up with proper graph-aware
> selectivity estimation system any time soon, so just having all graph
> OLTP queries suck until then because the planner hint is hardcoded
> doesn't seem like a better solution.  So I think this setting can be ok.

I agree. It's a bit lame, but seems pretty harmless, and I can't see
us realistically doing a lot better with any reasonable amount of
work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Fri, 28 Jan 2022 at 14:07, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
> > On the one hand, this smells like a planner hint.  But on the other
> > hand, it doesn't look like we will come up with proper graph-aware
> > selectivity estimation system any time soon, so just having all graph
> > OLTP queries suck until then because the planner hint is hardcoded
> > doesn't seem like a better solution.  So I think this setting can be ok.
>
> I agree. It's a bit lame, but seems pretty harmless, and I can't see
> us realistically doing a lot better with any reasonable amount of
> work.

Shall I set this as Ready For Committer?

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Parameter for planner estimate of recursive queries

From
Andres Freund
Date:
Hi,

On 2022-03-10 17:42:14 +0000, Simon Riggs wrote:
> Shall I set this as Ready For Committer?

Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158

[16:27:35.772] #   Failed test 'no parameters missing from postgresql.conf.sample'
[16:27:35.772] #   at t/003_check_guc.pl line 82.
[16:27:35.772] #          got: '1'
[16:27:35.772] #     expected: '0'
[16:27:35.772] # Looks like you failed 1 test of 3.
[16:27:35.772] [16:27:35] t/003_check_guc.pl ..............

Marked as waiting on author.

- Andres



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Tue, 22 Mar 2022 at 00:04, Andres Freund <andres@anarazel.de> wrote:

> On 2022-03-10 17:42:14 +0000, Simon Riggs wrote:
> > Shall I set this as Ready For Committer?
>
> Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158
>
> [16:27:35.772] #   Failed test 'no parameters missing from postgresql.conf.sample'
> [16:27:35.772] #   at t/003_check_guc.pl line 82.
> [16:27:35.772] #          got: '1'
> [16:27:35.772] #     expected: '0'
> [16:27:35.772] # Looks like you failed 1 test of 3.
> [16:27:35.772] [16:27:35] t/003_check_guc.pl ..............
>
> Marked as waiting on author.

Thanks.

I've corrected that by adding a line to postgresql.conf.sample, as
well as adding docs.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Parameter for planner estimate of recursive queries

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> On the one hand, this smells like a planner hint.  But on the other
>> hand, it doesn't look like we will come up with proper graph-aware
>> selectivity estimation system any time soon, so just having all graph
>> OLTP queries suck until then because the planner hint is hardcoded
>> doesn't seem like a better solution.  So I think this setting can be ok.

> I agree. It's a bit lame, but seems pretty harmless, and I can't see
> us realistically doing a lot better with any reasonable amount of
> work.

Yeah, agreed on all counts.  The thing that makes it lame is that
there's no reason to expect that the same multiplier is good for
every recursive query done in an installation, or even in a session.

One could imagine dealing with that by adding custom syntax to WITH,
as we have already done once:

WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...

But I *really* hesitate to go there, mainly because once we do
something like that we can't ever undo it.  I think Simon's
proposal is a reasonable low-effort compromise.

Some nitpicks:

* The new calculation needs clamp_row_est(), since the float
GUC could be fractional or even zero.

* Do we want to prevent the GUC value from being zero?  It's not
very sensible, plus I think we might want to reserve that value
to mean "use the built-in calculation", in case we ever do put
in some smarter logic here.  But I'm not sure what a reasonable
non-zero lower bound would be.

* The proposed docs claim that a smaller setting works by biasing
the planner towards fast-start plans, but I don't think I believe
that explanation.  I'd venture that we want text more along the
lines of "This may help the planner choose the most appropriate
method for joining the work table to the query's other tables".

            regards, tom lane



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Wed, 23 Mar 2022 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
> > <peter.eisentraut@enterprisedb.com> wrote:
> >> On the one hand, this smells like a planner hint.  But on the other
> >> hand, it doesn't look like we will come up with proper graph-aware
> >> selectivity estimation system any time soon, so just having all graph
> >> OLTP queries suck until then because the planner hint is hardcoded
> >> doesn't seem like a better solution.  So I think this setting can be ok.
>
> > I agree. It's a bit lame, but seems pretty harmless, and I can't see
> > us realistically doing a lot better with any reasonable amount of
> > work.
>
> Yeah, agreed on all counts.  The thing that makes it lame is that
> there's no reason to expect that the same multiplier is good for
> every recursive query done in an installation, or even in a session.
>
> One could imagine dealing with that by adding custom syntax to WITH,
> as we have already done once:
>
> WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...
>
> But I *really* hesitate to go there, mainly because once we do
> something like that we can't ever undo it.  I think Simon's
> proposal is a reasonable low-effort compromise.
>
> Some nitpicks:
>
> * The new calculation needs clamp_row_est(), since the float
> GUC could be fractional or even zero.

True, will do.

> * Do we want to prevent the GUC value from being zero?  It's not
> very sensible, plus I think we might want to reserve that value
> to mean "use the built-in calculation", in case we ever do put
> in some smarter logic here.  But I'm not sure what a reasonable
> non-zero lower bound would be.

Agreed, makes sense.

> * The proposed docs claim that a smaller setting works by biasing
> the planner towards fast-start plans, but I don't think I believe
> that explanation.  I'd venture that we want text more along the
> lines of "This may help the planner choose the most appropriate
> method for joining the work table to the query's other tables".

OK, will improve.

[New patch version pending]

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Wed, 23 Mar 2022 at 18:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> [New patch version pending]

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Parameter for planner estimate of recursive queries

From
Tom Lane
Date:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
>> [New patch version pending]

Do you have any objection if I rename the GUC to
recursive_worktable_factor?  That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

            regards, tom lane



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> >> [New patch version pending]
>
> Do you have any objection if I rename the GUC to
> recursive_worktable_factor?  That seems a bit clearer as to what
> it does, and it leaves more room for other knobs in the same area
> if we decide we need any.

None, I think your proposal is a better name.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Parameter for planner estimate of recursive queries

From
Tom Lane
Date:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
> On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do you have any objection if I rename the GUC to
>> recursive_worktable_factor?  That seems a bit clearer as to what
>> it does, and it leaves more room for other knobs in the same area
>> if we decide we need any.

> None, I think your proposal is a better name.

Pushed that way.

            regards, tom lane



Re: Parameter for planner estimate of recursive queries

From
Simon Riggs
Date:
On Thu, 24 Mar 2022 at 15:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Do you have any objection if I rename the GUC to
> >> recursive_worktable_factor?  That seems a bit clearer as to what
> >> it does, and it leaves more room for other knobs in the same area
> >> if we decide we need any.
>
> > None, I think your proposal is a better name.
>
> Pushed that way.

Ok, thanks.

-- 
Simon Riggs                http://www.EnterpriseDB.com/