Thread: Why enable_hashjoin Completely disables HashJoin
Hi, I found that the enable_hashjoin disables HashJoin completely. It's in the function add_paths_to_joinrel: if (enable_hashjoin || jointype == JOIN_FULL) hash_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, &extra); Instead, it should add a disable cost to the cost calculation of hashjoin. And now final_cost_hashjoin does the same thing: if (!enable_hashjoin) startup_cost += disable_cost; enable_mergejoin has the same problem. Test case: CREATE TABLE t_score_01( s_id int, s_score int, s_course char(8), c_id int); CREATE TABLE t_student_01( s_id int, s_name char(8)); insert into t_score_01 values( generate_series(1, 1000000), random()*100, 'course', generate_series(1, 1000000)); insert into t_student_01 values(generate_series(1, 1000000), 'name'); analyze t_score_01; analyze t_student_01; SET enable_hashjoin TO off; SET enable_nestloop TO off; SET enable_mergejoin TO off; explain select count(*) from t_student_01 a join t_score_01 b on a.s_id=b.s_id; After disabling all three, the HashJoin path should still be chosen. Attached is the patch file. -- Quan Zongliang Vastdata
Attachment
On 4/3/23 12:23, Quan Zongliang wrote: > Hi, > > I found that the enable_hashjoin disables HashJoin completely. > It's in the function add_paths_to_joinrel: > > if (enable_hashjoin || jointype == JOIN_FULL) > hash_inner_and_outer(root, joinrel, outerrel, innerrel, > jointype, &extra); > > Instead, it should add a disable cost to the cost calculation of > hashjoin. And now final_cost_hashjoin does the same thing: > > if (!enable_hashjoin) > startup_cost += disable_cost; > > > enable_mergejoin has the same problem. > > Test case: > > CREATE TABLE t_score_01( > s_id int, > s_score int, > s_course char(8), > c_id int); > > CREATE TABLE t_student_01( > s_id int, > s_name char(8)); > > insert into t_score_01 values( > generate_series(1, 1000000), random()*100, 'course', generate_series(1, > 1000000)); > > insert into t_student_01 values(generate_series(1, 1000000), 'name'); > > analyze t_score_01; > analyze t_student_01; > > SET enable_hashjoin TO off; > SET enable_nestloop TO off; > SET enable_mergejoin TO off; > > explain select count(*) > from t_student_01 a join t_score_01 b on a.s_id=b.s_id; > > After disabling all three, the HashJoin path should still be chosen. > It's not clear to me why that behavior would be desirable? Why is this an issue you need so solve? AFAIK the reason why some paths are actually disabled (not built at all) while others are only penalized by adding disable_cost is that we need to end up with at least one way to execute the query. So we pick a path that we know is possible (e.g. seqscan) and hard-disable other paths. But the always-possible path is only soft-disabled by disable_cost. For joins, we do the same thing. The hash/merge joins may not be possible, because the data types may not have hash/sort operators, etc. Nestloop is always possible. So we soft-disable nestloop but hard-disable hash/merge joins. I doubt we want to change this behavior, unless there's a good reason to do that ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Quan Zongliang <quanzongliang@yeah.net> writes: > I found that the enable_hashjoin disables HashJoin completely. Well, yeah. It's what you asked for. > Instead, it should add a disable cost to the cost calculation of > hashjoin. Why? The disable-cost stuff is a crude hack that we use when turning off a particular plan type entirely might render us unable to generate a valid plan. Hash join is not in that category. > After disabling all three, the HashJoin path should still be chosen. Why? Personally, I'd get rid of disable_cost altogether if I could. I'm not in a hurry to extend its use to more places. regards, tom lane
On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Personally, I'd get rid of disable_cost altogether if I could. > I'm not in a hurry to extend its use to more places. I agree. I've wondered if we should put some work into that. It feels bad to waste CPU cycles generating paths we intend to basically just throw away, and it feels even worse if they manage to beat out some other path on cost. It hasn't been obvious to me how we could restructure the existing logic to avoid relying on disable_cost. I sort of feel like it should be a two-pass algorithm: go through and generate all the path types that aren't disabled, and then if that results in no paths, try a do-over where you ignore the disable flags (or just some of them). But the code structure doesn't seem particularly amenable to that kind of thing. This hasn't caused me enough headaches yet that I've been willing to invest time in it, but it has caused me more than zero headaches... -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Personally, I'd get rid of disable_cost altogether if I could. >> I'm not in a hurry to extend its use to more places. > I agree. I've wondered if we should put some work into that. It feels > bad to waste CPU cycles generating paths we intend to basically just > throw away, and it feels even worse if they manage to beat out some > other path on cost. > It hasn't been obvious to me how we could restructure the existing > logic to avoid relying on disable_cost. Yeah. In some places it would not be too hard; for example, if we generated seqscan paths last instead of first for baserels, the rule could be "generate it if enable_seqscan is on OR if we made no other path for the rel". It's much messier for joins though, partly because the same joinrel will be considered multiple times as we process different join orderings, plus it's usually unclear whether failing to generate any paths for joinrel X will lead to overall failure. A solution that would work is to treat disable_cost as a form of infinity that's counted separately from the actual cost estimate, that is we label paths as "cost X, plus there are N uses of disabled plan types". Then you sort first on N and after that on X. But this'd add a good number of cycles to add_path, which I've not wanted to expend on a non-mainstream usage. regards, tom lane
On Mon, Apr 3, 2023 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah. In some places it would not be too hard; for example, if we > generated seqscan paths last instead of first for baserels, the rule > could be "generate it if enable_seqscan is on OR if we made no other > path for the rel". It's much messier for joins though, partly because > the same joinrel will be considered multiple times as we process > different join orderings, plus it's usually unclear whether failing > to generate any paths for joinrel X will lead to overall failure. Yeah, good point. I'm now remembering that at one point I'd had the idea of running the whole find-a-plan-for-a-jointree step and then running it a second time if it fails to find a plan. But I think that requires some restructuring, because I think right now it does some things that we should only do once we know we're definitely getting a plan out. Or else we have to reset some state. Like if we want to go back and maybe add more paths then we have to undo and redo whatever set_cheapest() did. > A solution that would work is to treat disable_cost as a form of infinity > that's counted separately from the actual cost estimate, that is we > label paths as "cost X, plus there are N uses of disabled plan types". > Then you sort first on N and after that on X. But this'd add a good > number of cycles to add_path, which I've not wanted to expend on a > non-mainstream usage. Yeah, I thought of that at one point too and rejected it for the same reason. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2023-04-03 14:04:30 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Personally, I'd get rid of disable_cost altogether if I could. > >> I'm not in a hurry to extend its use to more places. > > > I agree. I've wondered if we should put some work into that. It feels > > bad to waste CPU cycles generating paths we intend to basically just > > throw away, and it feels even worse if they manage to beat out some > > other path on cost. > > > It hasn't been obvious to me how we could restructure the existing > > logic to avoid relying on disable_cost. > > Yeah. In some places it would not be too hard; for example, if we > generated seqscan paths last instead of first for baserels, the rule > could be "generate it if enable_seqscan is on OR if we made no other > path for the rel". It's much messier for joins though, partly because > the same joinrel will be considered multiple times as we process > different join orderings, plus it's usually unclear whether failing > to generate any paths for joinrel X will lead to overall failure. > > A solution that would work is to treat disable_cost as a form of infinity > that's counted separately from the actual cost estimate, that is we > label paths as "cost X, plus there are N uses of disabled plan types". > Then you sort first on N and after that on X. But this'd add a good > number of cycles to add_path, which I've not wanted to expend on a > non-mainstream usage. It sounds too hard compared to the gains, but another way could be to plan with the relevant path generation hard disabled, and plan from scratch, with additional scan types enabled, if we end up being unable to generate valid plan. Greetings, Andres Freund
On Tue, 4 Apr 2023 at 11:18, Andres Freund <andres@anarazel.de> wrote: > It sounds too hard compared to the gains, but another way could be to plan > with the relevant path generation hard disabled, and plan from scratch, with > additional scan types enabled, if we end up being unable to generate valid > plan. I think there would be quite a bit of work to do before we could ever start to think about that. The planner does quite a bit of writing on the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd either need to fix all those first or make a copy of the parse before planning. The latter is quite expensive today. It's also not clear to me how you'd know what you'd need to enable again to get the 2nd attempt to produce a plan this time around. I'd assume you'd want the minimum possible set of enable_* GUCs turned back on, but what would you do in cases where there's an aggregate and both enable_hashagg and enable_sort are both disabled and there are no indexes providing pre-sorted input? David David
Andres Freund <andres@anarazel.de> writes: > It sounds too hard compared to the gains, but another way could be to plan > with the relevant path generation hard disabled, and plan from scratch, with > additional scan types enabled, if we end up being unable to generate valid > plan. Actually, I kind of like that. It would put the extra cost in a place it belongs: if you have enough enable_foo turned off to prevent generating a valid plan, it'll cost you extra to make a plan ... but likely you'll be paying even more in runtime due to not getting a good plan, so maybe that doesn't matter anyway. I'd limit it to two passes: first try honors all enable_foo switches, second try ignores all. I'm not quite sure how this could be wedged into the existing code structure --- in particular I am not sure that we're prepared to do two passes of baserel path generation. (GEQO is an existence proof that we could handle it for join planning, though.) Or we could rethink the design goal of not allowing enable_foo switches to cause us to fail to make a plan. That might be unusable though. regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes: > I think there would be quite a bit of work to do before we could ever > start to think about that. The planner does quite a bit of writing on > the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd > either need to fix all those first or make a copy of the parse before > planning. Yeah, we'd have to be sure that all that preliminary work is teased apart from the actual path-making. I think we are probably pretty close to that but not there yet. Subqueries might be problematic, but perhaps we could define our way out of that by saying that this retry principle applies independently in each planner recursion level. > It's also not clear to > me how you'd know what you'd need to enable again to get the 2nd > attempt to produce a plan this time around. I'd assume you'd want the > minimum possible set of enable_* GUCs turned back on, but what would > you do in cases where there's an aggregate and both enable_hashagg and > enable_sort are both disabled and there are no indexes providing > pre-sorted input? As I commented concurrently, I think we should simply not try to solve that conundrum: if you want control, don't pose impossible problems. There's no principled way that we could decide which of enable_hashagg and enable_sort to ignore first, for example. regards, tom lane
On 2023/4/3 19:44, Tomas Vondra wrote: > On 4/3/23 12:23, Quan Zongliang wrote: >> Hi, >> >> I found that the enable_hashjoin disables HashJoin completely. >> It's in the function add_paths_to_joinrel: >> >> if (enable_hashjoin || jointype == JOIN_FULL) >> hash_inner_and_outer(root, joinrel, outerrel, innerrel, >> jointype, &extra); >> >> Instead, it should add a disable cost to the cost calculation of >> hashjoin. And now final_cost_hashjoin does the same thing: >> >> if (!enable_hashjoin) >> startup_cost += disable_cost; >> >> >> enable_mergejoin has the same problem. >> >> Test case: >> >> CREATE TABLE t_score_01( >> s_id int, >> s_score int, >> s_course char(8), >> c_id int); >> >> CREATE TABLE t_student_01( >> s_id int, >> s_name char(8)); >> >> insert into t_score_01 values( >> generate_series(1, 1000000), random()*100, 'course', generate_series(1, >> 1000000)); >> >> insert into t_student_01 values(generate_series(1, 1000000), 'name'); >> >> analyze t_score_01; >> analyze t_student_01; >> >> SET enable_hashjoin TO off; >> SET enable_nestloop TO off; >> SET enable_mergejoin TO off; >> >> explain select count(*) >> from t_student_01 a join t_score_01 b on a.s_id=b.s_id; >> >> After disabling all three, the HashJoin path should still be chosen. >> > > It's not clear to me why that behavior would be desirable? Why is this > an issue you need so solve? > Because someone noticed that when he set enable_hashjoin, enable_mergejoin and enable_nestloop to off. The statement seemed to get stuck (actually because it chose the NestedLoop path, which took a long long time to run). If enable_hashjoin and enable_nestloop disable generating these two paths. Then enable_nestloop should do the same thing, but it doesn't. > AFAIK the reason why some paths are actually disabled (not built at all) > while others are only penalized by adding disable_cost is that we need > to end up with at least one way to execute the query. So we pick a path > that we know is possible (e.g. seqscan) and hard-disable other paths. > But the always-possible path is only soft-disabled by disable_cost. > > For joins, we do the same thing. The hash/merge joins may not be > possible, because the data types may not have hash/sort operators, etc. > Nestloop is always possible. So we soft-disable nestloop but > hard-disable hash/merge joins. > > I doubt we want to change this behavior, unless there's a good reason to > do that ... It doesn't have to change. Because selecting NestedLoop doesn't really get stuck either. It just takes too long to run. I will change the patch status to Withdrawn. > > > regards >
On Tue, Apr 4, 2023 at 3:38 AM Quan Zongliang <quanzongliang@yeah.net> wrote: > Because someone noticed that when he set enable_hashjoin, > enable_mergejoin and enable_nestloop to off. The statement seemed to get > stuck (actually because it chose the NestedLoop path, which took a long > long time to run). > If enable_hashjoin and enable_nestloop disable generating these two > paths. Then enable_nestloop should do the same thing, but it doesn't. This all seems like expected behavior. If you disable an entire plan type, you should expect to get some bad plans. And if you disable all the plan types, you should still expect to get some plan, but maybe an extremely bad one. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 3 Apr 2023 at 19:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Or we could rethink the design goal of not allowing enable_foo switches > to cause us to fail to make a plan. That might be unusable though. Off the top of my head I don't see why. It's not like the possible plans are going to change on you often, only when DDL changes the schema. The only one that gives me pause is enable_seqscan. I've seen multiple sites that turn it off as a hammer to force OLTP-style plans. They still get sequential scans where they're absolutely necessary such as small reference tables with no usable index and rely on that behaviour. In that case we would ideally generate a realistic cost estimate for the unavoidable sequential scan to avoid twisting the rest of the plan in strange ways. But perhaps these sites would be better served with different machinery anyways. If they actually did get a sequential scan on a large table or any query where the estimate was very high where they were expecting low latency OLTP queries perhaps they would prefer to get an error than some weird plan anyways. And for query planning debugging purposes of course it would be more powerful to be able to enable/disable plan types per-node. That would avoid the problem of not being able to effectively test a plan without a sequential scan on one table when another table still needs it. But that direction... -- greg
Greg Stark <stark@mit.edu> writes: > On Mon, 3 Apr 2023 at 19:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Or we could rethink the design goal of not allowing enable_foo switches >> to cause us to fail to make a plan. That might be unusable though. > The only one that gives me pause is enable_seqscan. I've seen multiple > sites that turn it off as a hammer to force OLTP-style plans. Yeah, that. There are definitely people using some of these switches in production, hence relying on the current (and documented) behavior. On the whole I doubt we can get away with that answer. > In that case we would ideally generate a realistic cost estimate for > the unavoidable sequential scan to avoid twisting the rest of the plan > in strange ways. As I mentioned earlier, I think it might be possible to hack up the seqscan case to avoid use of disable_cost pretty easily. It's far easier to detect that no other plans are possible than it is once you get to the join stage. Perhaps that's worth doing. regards, tom lane