Thread: Parameter for planner estimate of recursive queries
I've been investigating the poor performance of a WITH RECURSIVE query, which I've recreated with test data. The first thing was to re-write the query, which helped improve performance by about 30%, but the plan was still very bad. With a small patch I've been able to improve performance by about x100. The poor performance is traced to the planner cost estimates for recursive queries. Specifically, the cost of the recursive arm of the query is evaluated based upon both of these hardcoded assumptions: 1. The recursion will last for 10 loops 2. The average size of the worktable will be 10x the size of the initial query (non-recursive term). Taken together these assumptions lead to a very poor estimate of the worktable activity (in this case), which leads to the plan changing as a result. The factor 10 is a reasonably safe assumption and helps avoid worst case behavior in bigger graph queries. However, the factor 10 is way too large for many types of graph query, such as where the path through the data is tight, and/or the query is written to prune bushy graphs, e.g. shortest path queries. The factor 10 should not be hardcoded in the planner, but should be settable, just as cursor_tuple_fraction is. I've written a short patch to make the estimate of the avg size of the worktable configurable: recursive_worktable_estimate = N (default 10) Using this parameter with the test query results in a consistently repeatable ~100x gain in performance, using recursive_worktable_estimate = 1 for a shortest path query: Unpatched: 1775ms Patched: 17.2ms This is because the estimated size of the worktable is closer to the truth and so leads naturally to a more sensible plan. EXPLAINs attached - please look at the estimated rows for the WorkTable Scan. There are various options for setting the two estimates: just one, or other, or both values separately, or both together. Note that I haven't touched the estimate that recursion will last for 10 loops. I figured that people would object to two knobs. Thoughts? There are 2 other ways to speed up the query. One is to set enable_seqscan = off, which helps about 20%, but may have other consequences. Two is to set work_mem = 512MB, so that the poor plan (hash) works as fast as possible, but that is far from optimal either. Getting the right plan is x20 faster than either of those alternatives. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > The poor performance is traced to the planner cost estimates for > recursive queries. Specifically, the cost of the recursive arm of the > query is evaluated based upon both of these hardcoded assumptions: > > 1. The recursion will last for 10 loops > 2. The average size of the worktable will be 10x the size of the > initial query (non-recursive term). > > Taken together these assumptions lead to a very poor estimate of the > worktable activity (in this case), which leads to the plan changing as > a result. > > The factor 10 is a reasonably safe assumption and helps avoid worst > case behavior in bigger graph queries. However, the factor 10 is way > too large for many types of graph query, such as where the path > through the data is tight, and/or the query is written to prune bushy > graphs, e.g. shortest path queries. The factor 10 should not be > hardcoded in the planner, but should be settable, just as > cursor_tuple_fraction is. If you think this should be derived without parameters, then we would want a function that starts at 1 for 1 input row and gets much larger for larger input. The thinking here is that Graph OLTP is often a shortest path between two nodes, whereas Graph Analytics and so the worktable will get much bigger. So I'm, thinking we use rel->tuples = min(1, cte_rows * cte_rows/2); -- Simon Riggs http://www.EnterpriseDB.com/
> The factor 10 should not be hardcoded in the planner, but should be settable, just as cursor_tuple_fraction is.
I feel considerably out of my depth here, but I like the idea of a working table size multiplier GUC, given the challenges of predicting the number of iterations (and any adjustments to cardinality per iteration). An exponential cost function may lead to increasingly pathological outcomes, especially when estimates for cte_rows are off.
In the EXPLAINs, it looked like the estimates for knows_pkey were off by an order of magnitude or so. It's possible the planner would have chosen the Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg would have estimated to roughly the same size as the second plan anyways, even with rel->tuples = 10 * cte_rows).
I also wonder if there's a better default than cte_rows * 10, but implementing a new GUC sounds like a reasonable solution to this as well.
--
Kenaniah
On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> The poor performance is traced to the planner cost estimates for
> recursive queries. Specifically, the cost of the recursive arm of the
> query is evaluated based upon both of these hardcoded assumptions:
>
> 1. The recursion will last for 10 loops
> 2. The average size of the worktable will be 10x the size of the
> initial query (non-recursive term).
>
> Taken together these assumptions lead to a very poor estimate of the
> worktable activity (in this case), which leads to the plan changing as
> a result.
>
> The factor 10 is a reasonably safe assumption and helps avoid worst
> case behavior in bigger graph queries. However, the factor 10 is way
> too large for many types of graph query, such as where the path
> through the data is tight, and/or the query is written to prune bushy
> graphs, e.g. shortest path queries. The factor 10 should not be
> hardcoded in the planner, but should be settable, just as
> cursor_tuple_fraction is.
If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.
So I'm, thinking we use
rel->tuples = min(1, cte_rows * cte_rows/2);
--
Simon Riggs http://www.EnterpriseDB.com/
On 31.12.21 15:10, Simon Riggs wrote: >> The factor 10 is a reasonably safe assumption and helps avoid worst >> case behavior in bigger graph queries. However, the factor 10 is way >> too large for many types of graph query, such as where the path >> through the data is tight, and/or the query is written to prune bushy >> graphs, e.g. shortest path queries. The factor 10 should not be >> hardcoded in the planner, but should be settable, just as >> cursor_tuple_fraction is. > If you think this should be derived without parameters, then we would > want a function that starts at 1 for 1 input row and gets much larger > for larger input. The thinking here is that Graph OLTP is often a > shortest path between two nodes, whereas Graph Analytics and so the > worktable will get much bigger. On the one hand, this smells like a planner hint. But on the other hand, it doesn't look like we will come up with proper graph-aware selectivity estimation system any time soon, so just having all graph OLTP queries suck until then because the planner hint is hardcoded doesn't seem like a better solution. So I think this setting can be ok. I think the way you have characterized it makes sense, too: for graph OLAP, you want a larger value, for graph OLTP, you want a smaller value.
On Tue, 25 Jan 2022 at 14:44, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 31.12.21 15:10, Simon Riggs wrote:
>> The factor 10 is a reasonably safe assumption and helps avoid worst
>> case behavior in bigger graph queries. However, the factor 10 is way
>> too large for many types of graph query, such as where the path
>> through the data is tight, and/or the query is written to prune bushy
>> graphs, e.g. shortest path queries. The factor 10 should not be
>> hardcoded in the planner, but should be settable, just as
>> cursor_tuple_fraction is.
> If you think this should be derived without parameters, then we would
> want a function that starts at 1 for 1 input row and gets much larger
> for larger input. The thinking here is that Graph OLTP is often a
> shortest path between two nodes, whereas Graph Analytics and so the
> worktable will get much bigger.
On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.
I think the way you have characterized it makes sense, too: for graph
OLAP, you want a larger value, for graph OLTP, you want a smaller value.
Do you think there is a case to replace the 10x multiplier with "recursive_worktable_estimate" for total_rows calculation in the cost_recursive_union function too?
On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > On the one hand, this smells like a planner hint. But on the other > hand, it doesn't look like we will come up with proper graph-aware > selectivity estimation system any time soon, so just having all graph > OLTP queries suck until then because the planner hint is hardcoded > doesn't seem like a better solution. So I think this setting can be ok. I agree. It's a bit lame, but seems pretty harmless, and I can't see us realistically doing a lot better with any reasonable amount of work. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, 28 Jan 2022 at 14:07, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: > > On the one hand, this smells like a planner hint. But on the other > > hand, it doesn't look like we will come up with proper graph-aware > > selectivity estimation system any time soon, so just having all graph > > OLTP queries suck until then because the planner hint is hardcoded > > doesn't seem like a better solution. So I think this setting can be ok. > > I agree. It's a bit lame, but seems pretty harmless, and I can't see > us realistically doing a lot better with any reasonable amount of > work. Shall I set this as Ready For Committer? -- Simon Riggs http://www.EnterpriseDB.com/
Hi, On 2022-03-10 17:42:14 +0000, Simon Riggs wrote: > Shall I set this as Ready For Committer? Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158 [16:27:35.772] # Failed test 'no parameters missing from postgresql.conf.sample' [16:27:35.772] # at t/003_check_guc.pl line 82. [16:27:35.772] # got: '1' [16:27:35.772] # expected: '0' [16:27:35.772] # Looks like you failed 1 test of 3. [16:27:35.772] [16:27:35] t/003_check_guc.pl .............. Marked as waiting on author. - Andres
On Tue, 22 Mar 2022 at 00:04, Andres Freund <andres@anarazel.de> wrote: > On 2022-03-10 17:42:14 +0000, Simon Riggs wrote: > > Shall I set this as Ready For Committer? > > Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158 > > [16:27:35.772] # Failed test 'no parameters missing from postgresql.conf.sample' > [16:27:35.772] # at t/003_check_guc.pl line 82. > [16:27:35.772] # got: '1' > [16:27:35.772] # expected: '0' > [16:27:35.772] # Looks like you failed 1 test of 3. > [16:27:35.772] [16:27:35] t/003_check_guc.pl .............. > > Marked as waiting on author. Thanks. I've corrected that by adding a line to postgresql.conf.sample, as well as adding docs. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> On the one hand, this smells like a planner hint. But on the other >> hand, it doesn't look like we will come up with proper graph-aware >> selectivity estimation system any time soon, so just having all graph >> OLTP queries suck until then because the planner hint is hardcoded >> doesn't seem like a better solution. So I think this setting can be ok. > I agree. It's a bit lame, but seems pretty harmless, and I can't see > us realistically doing a lot better with any reasonable amount of > work. Yeah, agreed on all counts. The thing that makes it lame is that there's no reason to expect that the same multiplier is good for every recursive query done in an installation, or even in a session. One could imagine dealing with that by adding custom syntax to WITH, as we have already done once: WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ... But I *really* hesitate to go there, mainly because once we do something like that we can't ever undo it. I think Simon's proposal is a reasonable low-effort compromise. Some nitpicks: * The new calculation needs clamp_row_est(), since the float GUC could be fractional or even zero. * Do we want to prevent the GUC value from being zero? It's not very sensible, plus I think we might want to reserve that value to mean "use the built-in calculation", in case we ever do put in some smarter logic here. But I'm not sure what a reasonable non-zero lower bound would be. * The proposed docs claim that a smaller setting works by biasing the planner towards fast-start plans, but I don't think I believe that explanation. I'd venture that we want text more along the lines of "This may help the planner choose the most appropriate method for joining the work table to the query's other tables". regards, tom lane
On Wed, 23 Mar 2022 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut > > <peter.eisentraut@enterprisedb.com> wrote: > >> On the one hand, this smells like a planner hint. But on the other > >> hand, it doesn't look like we will come up with proper graph-aware > >> selectivity estimation system any time soon, so just having all graph > >> OLTP queries suck until then because the planner hint is hardcoded > >> doesn't seem like a better solution. So I think this setting can be ok. > > > I agree. It's a bit lame, but seems pretty harmless, and I can't see > > us realistically doing a lot better with any reasonable amount of > > work. > > Yeah, agreed on all counts. The thing that makes it lame is that > there's no reason to expect that the same multiplier is good for > every recursive query done in an installation, or even in a session. > > One could imagine dealing with that by adding custom syntax to WITH, > as we have already done once: > > WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ... > > But I *really* hesitate to go there, mainly because once we do > something like that we can't ever undo it. I think Simon's > proposal is a reasonable low-effort compromise. > > Some nitpicks: > > * The new calculation needs clamp_row_est(), since the float > GUC could be fractional or even zero. True, will do. > * Do we want to prevent the GUC value from being zero? It's not > very sensible, plus I think we might want to reserve that value > to mean "use the built-in calculation", in case we ever do put > in some smarter logic here. But I'm not sure what a reasonable > non-zero lower bound would be. Agreed, makes sense. > * The proposed docs claim that a smaller setting works by biasing > the planner towards fast-start plans, but I don't think I believe > that explanation. I'd venture that we want text more along the > lines of "This may help the planner choose the most appropriate > method for joining the work table to the query's other tables". OK, will improve. [New patch version pending] -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, 23 Mar 2022 at 18:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > [New patch version pending] -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Simon Riggs <simon.riggs@enterprisedb.com> writes: >> [New patch version pending] Do you have any objection if I rename the GUC to recursive_worktable_factor? That seems a bit clearer as to what it does, and it leaves more room for other knobs in the same area if we decide we need any. regards, tom lane
On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > >> [New patch version pending] > > Do you have any objection if I rename the GUC to > recursive_worktable_factor? That seems a bit clearer as to what > it does, and it leaves more room for other knobs in the same area > if we decide we need any. None, I think your proposal is a better name. -- Simon Riggs http://www.EnterpriseDB.com/
Simon Riggs <simon.riggs@enterprisedb.com> writes: > On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Do you have any objection if I rename the GUC to >> recursive_worktable_factor? That seems a bit clearer as to what >> it does, and it leaves more room for other knobs in the same area >> if we decide we need any. > None, I think your proposal is a better name. Pushed that way. regards, tom lane
On Thu, 24 Mar 2022 at 15:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Do you have any objection if I rename the GUC to > >> recursive_worktable_factor? That seems a bit clearer as to what > >> it does, and it leaves more room for other knobs in the same area > >> if we decide we need any. > > > None, I think your proposal is a better name. > > Pushed that way. Ok, thanks. -- Simon Riggs http://www.EnterpriseDB.com/