Thread: Why enable_hashjoin Completely disables HashJoin

Why enable_hashjoin Completely disables HashJoin

From
Quan Zongliang
Date:
Hi,

I found that the enable_hashjoin disables HashJoin completely.
It's in the function add_paths_to_joinrel:

if (enable_hashjoin || jointype == JOIN_FULL)
    hash_inner_and_outer(root, joinrel, outerrel, innerrel,
                jointype, &extra);

Instead, it should add a disable cost to the cost calculation of 
hashjoin. And now final_cost_hashjoin does the same thing:

if (!enable_hashjoin)
    startup_cost += disable_cost;


enable_mergejoin has the same problem.

Test case:

CREATE TABLE t_score_01(
s_id int,
s_score int,
s_course char(8),
c_id int);

CREATE TABLE t_student_01(
s_id int,
s_name char(8));

insert into t_score_01 values(
generate_series(1, 1000000), random()*100, 'course', generate_series(1, 
1000000));

insert into t_student_01 values(generate_series(1, 1000000), 'name');

analyze t_score_01;
analyze t_student_01;

SET enable_hashjoin TO off;
SET enable_nestloop TO off;
SET enable_mergejoin TO off;

explain select count(*)
from t_student_01 a join t_score_01 b on a.s_id=b.s_id;

After disabling all three, the HashJoin path should still be chosen.

Attached is the patch file.

--
Quan Zongliang
Vastdata
Attachment

Re: Why enable_hashjoin Completely disables HashJoin

From
Tomas Vondra
Date:
On 4/3/23 12:23, Quan Zongliang wrote:
> Hi,
> 
> I found that the enable_hashjoin disables HashJoin completely.
> It's in the function add_paths_to_joinrel:
> 
> if (enable_hashjoin || jointype == JOIN_FULL)
>     hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>                 jointype, &extra);
> 
> Instead, it should add a disable cost to the cost calculation of
> hashjoin. And now final_cost_hashjoin does the same thing:
> 
> if (!enable_hashjoin)
>     startup_cost += disable_cost;
> 
> 
> enable_mergejoin has the same problem.
> 
> Test case:
> 
> CREATE TABLE t_score_01(
> s_id int,
> s_score int,
> s_course char(8),
> c_id int);
> 
> CREATE TABLE t_student_01(
> s_id int,
> s_name char(8));
> 
> insert into t_score_01 values(
> generate_series(1, 1000000), random()*100, 'course', generate_series(1,
> 1000000));
> 
> insert into t_student_01 values(generate_series(1, 1000000), 'name');
> 
> analyze t_score_01;
> analyze t_student_01;
> 
> SET enable_hashjoin TO off;
> SET enable_nestloop TO off;
> SET enable_mergejoin TO off;
> 
> explain select count(*)
> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
> 
> After disabling all three, the HashJoin path should still be chosen.
> 

It's not clear to me why that behavior would be desirable? Why is this
an issue you need so solve?

AFAIK the reason why some paths are actually disabled (not built at all)
while others are only penalized by adding disable_cost is that we need
to end up with at least one way to execute the query. So we pick a path
that we know is possible (e.g. seqscan) and hard-disable other paths.
But the always-possible path is only soft-disabled by disable_cost.

For joins, we do the same thing. The hash/merge joins may not be
possible, because the data types may not have hash/sort operators, etc.
Nestloop is always possible. So we soft-disable nestloop but
hard-disable hash/merge joins.

I doubt we want to change this behavior, unless there's a good reason to
do that ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Why enable_hashjoin Completely disables HashJoin

From
Tom Lane
Date:
Quan Zongliang <quanzongliang@yeah.net> writes:
> I found that the enable_hashjoin disables HashJoin completely.

Well, yeah.  It's what you asked for.

> Instead, it should add a disable cost to the cost calculation of 
> hashjoin.

Why?  The disable-cost stuff is a crude hack that we use when
turning off a particular plan type entirely might render us
unable to generate a valid plan.  Hash join is not in that
category.

> After disabling all three, the HashJoin path should still be chosen.

Why?

Personally, I'd get rid of disable_cost altogether if I could.
I'm not in a hurry to extend its use to more places.

            regards, tom lane



Re: Why enable_hashjoin Completely disables HashJoin

From
Robert Haas
Date:
On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Personally, I'd get rid of disable_cost altogether if I could.
> I'm not in a hurry to extend its use to more places.

I agree. I've wondered if we should put some work into that. It feels
bad to waste CPU cycles generating paths we intend to basically just
throw away, and it feels even worse if they manage to beat out some
other path on cost.

It hasn't been obvious to me how we could restructure the existing
logic to avoid relying on disable_cost. I sort of feel like it should
be a two-pass algorithm: go through and generate all the path types
that aren't disabled, and then if that results in no paths, try a
do-over where you ignore the disable flags (or just some of them). But
the code structure doesn't seem particularly amenable to that kind of
thing.

This hasn't caused me enough headaches yet that I've been willing to
invest time in it, but it has caused me more than zero headaches...

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Why enable_hashjoin Completely disables HashJoin

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Personally, I'd get rid of disable_cost altogether if I could.
>> I'm not in a hurry to extend its use to more places.

> I agree. I've wondered if we should put some work into that. It feels
> bad to waste CPU cycles generating paths we intend to basically just
> throw away, and it feels even worse if they manage to beat out some
> other path on cost.

> It hasn't been obvious to me how we could restructure the existing
> logic to avoid relying on disable_cost.

Yeah.  In some places it would not be too hard; for example, if we
generated seqscan paths last instead of first for baserels, the rule
could be "generate it if enable_seqscan is on OR if we made no other
path for the rel".  It's much messier for joins though, partly because
the same joinrel will be considered multiple times as we process
different join orderings, plus it's usually unclear whether failing
to generate any paths for joinrel X will lead to overall failure.

A solution that would work is to treat disable_cost as a form of infinity
that's counted separately from the actual cost estimate, that is we
label paths as "cost X, plus there are N uses of disabled plan types".
Then you sort first on N and after that on X.  But this'd add a good
number of cycles to add_path, which I've not wanted to expend on a
non-mainstream usage.

            regards, tom lane



Re: Why enable_hashjoin Completely disables HashJoin

From
Robert Haas
Date:
On Mon, Apr 3, 2023 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  In some places it would not be too hard; for example, if we
> generated seqscan paths last instead of first for baserels, the rule
> could be "generate it if enable_seqscan is on OR if we made no other
> path for the rel".  It's much messier for joins though, partly because
> the same joinrel will be considered multiple times as we process
> different join orderings, plus it's usually unclear whether failing
> to generate any paths for joinrel X will lead to overall failure.

Yeah, good point. I'm now remembering that at one point I'd had the
idea of running the whole find-a-plan-for-a-jointree step and then
running it a second time if it fails to find a plan. But I think that
requires some restructuring, because I think right now it does some
things that we should only do once we know we're definitely getting a
plan out. Or else we have to reset some state. Like if we want to go
back and maybe add more paths then we have to undo and redo whatever
set_cheapest() did.

> A solution that would work is to treat disable_cost as a form of infinity
> that's counted separately from the actual cost estimate, that is we
> label paths as "cost X, plus there are N uses of disabled plan types".
> Then you sort first on N and after that on X.  But this'd add a good
> number of cycles to add_path, which I've not wanted to expend on a
> non-mainstream usage.

Yeah, I thought of that at one point too and rejected it for the same reason.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Why enable_hashjoin Completely disables HashJoin

From
Andres Freund
Date:
Hi,

On 2023-04-03 14:04:30 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Personally, I'd get rid of disable_cost altogether if I could.
> >> I'm not in a hurry to extend its use to more places.
> 
> > I agree. I've wondered if we should put some work into that. It feels
> > bad to waste CPU cycles generating paths we intend to basically just
> > throw away, and it feels even worse if they manage to beat out some
> > other path on cost.
> 
> > It hasn't been obvious to me how we could restructure the existing
> > logic to avoid relying on disable_cost.
> 
> Yeah.  In some places it would not be too hard; for example, if we
> generated seqscan paths last instead of first for baserels, the rule
> could be "generate it if enable_seqscan is on OR if we made no other
> path for the rel".  It's much messier for joins though, partly because
> the same joinrel will be considered multiple times as we process
> different join orderings, plus it's usually unclear whether failing
> to generate any paths for joinrel X will lead to overall failure.
> 
> A solution that would work is to treat disable_cost as a form of infinity
> that's counted separately from the actual cost estimate, that is we
> label paths as "cost X, plus there are N uses of disabled plan types".
> Then you sort first on N and after that on X.  But this'd add a good
> number of cycles to add_path, which I've not wanted to expend on a
> non-mainstream usage.

It sounds too hard compared to the gains, but another way could be to plan
with the relevant path generation hard disabled, and plan from scratch, with
additional scan types enabled, if we end up being unable to generate valid
plan.

Greetings,

Andres Freund



Re: Why enable_hashjoin Completely disables HashJoin

From
David Rowley
Date:
On Tue, 4 Apr 2023 at 11:18, Andres Freund <andres@anarazel.de> wrote:
> It sounds too hard compared to the gains, but another way could be to plan
> with the relevant path generation hard disabled, and plan from scratch, with
> additional scan types enabled, if we end up being unable to generate valid
> plan.

I think there would be quite a bit of work to do before we could ever
start to think about that.  The planner does quite a bit of writing on
the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd
either need to fix all those first or make a copy of the parse before
planning. The latter is quite expensive today.  It's also not clear to
me how you'd know what you'd need to enable again to get the 2nd
attempt to produce a plan this time around. I'd assume you'd want the
minimum possible set of enable_* GUCs turned back on, but what would
you do in cases where there's an aggregate and both enable_hashagg and
enable_sort are both disabled and there are no indexes providing
pre-sorted input?

David

David



Re: Why enable_hashjoin Completely disables HashJoin

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> It sounds too hard compared to the gains, but another way could be to plan
> with the relevant path generation hard disabled, and plan from scratch, with
> additional scan types enabled, if we end up being unable to generate valid
> plan.

Actually, I kind of like that.  It would put the extra cost in a place
it belongs: if you have enough enable_foo turned off to prevent
generating a valid plan, it'll cost you extra to make a plan ... but
likely you'll be paying even more in runtime due to not getting a good
plan, so maybe that doesn't matter anyway.  I'd limit it to two passes:
first try honors all enable_foo switches, second try ignores all.

I'm not quite sure how this could be wedged into the existing code
structure --- in particular I am not sure that we're prepared to do
two passes of baserel path generation.  (GEQO is an existence proof
that we could handle it for join planning, though.)

Or we could rethink the design goal of not allowing enable_foo switches
to cause us to fail to make a plan.  That might be unusable though.

            regards, tom lane



Re: Why enable_hashjoin Completely disables HashJoin

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I think there would be quite a bit of work to do before we could ever
> start to think about that.  The planner does quite a bit of writing on
> the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd
> either need to fix all those first or make a copy of the parse before
> planning.

Yeah, we'd have to be sure that all that preliminary work is teased apart
from the actual path-making.  I think we are probably pretty close to
that but not there yet.  Subqueries might be problematic, but perhaps
we could define our way out of that by saying that this retry principle
applies independently in each planner recursion level.

> It's also not clear to
> me how you'd know what you'd need to enable again to get the 2nd
> attempt to produce a plan this time around. I'd assume you'd want the
> minimum possible set of enable_* GUCs turned back on, but what would
> you do in cases where there's an aggregate and both enable_hashagg and
> enable_sort are both disabled and there are no indexes providing
> pre-sorted input?

As I commented concurrently, I think we should simply not try to solve
that conundrum: if you want control, don't pose impossible problems.
There's no principled way that we could decide which of enable_hashagg
and enable_sort to ignore first, for example.

            regards, tom lane



Re: Why enable_hashjoin Completely disables HashJoin

From
Quan Zongliang
Date:

On 2023/4/3 19:44, Tomas Vondra wrote:
> On 4/3/23 12:23, Quan Zongliang wrote:
>> Hi,
>>
>> I found that the enable_hashjoin disables HashJoin completely.
>> It's in the function add_paths_to_joinrel:
>>
>> if (enable_hashjoin || jointype == JOIN_FULL)
>>      hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>>                  jointype, &extra);
>>
>> Instead, it should add a disable cost to the cost calculation of
>> hashjoin. And now final_cost_hashjoin does the same thing:
>>
>> if (!enable_hashjoin)
>>      startup_cost += disable_cost;
>>
>>
>> enable_mergejoin has the same problem.
>>
>> Test case:
>>
>> CREATE TABLE t_score_01(
>> s_id int,
>> s_score int,
>> s_course char(8),
>> c_id int);
>>
>> CREATE TABLE t_student_01(
>> s_id int,
>> s_name char(8));
>>
>> insert into t_score_01 values(
>> generate_series(1, 1000000), random()*100, 'course', generate_series(1,
>> 1000000));
>>
>> insert into t_student_01 values(generate_series(1, 1000000), 'name');
>>
>> analyze t_score_01;
>> analyze t_student_01;
>>
>> SET enable_hashjoin TO off;
>> SET enable_nestloop TO off;
>> SET enable_mergejoin TO off;
>>
>> explain select count(*)
>> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
>>
>> After disabling all three, the HashJoin path should still be chosen.
>>
> 
> It's not clear to me why that behavior would be desirable? Why is this
> an issue you need so solve?
> 
Because someone noticed that when he set enable_hashjoin, 
enable_mergejoin and enable_nestloop to off. The statement seemed to get 
stuck (actually because it chose the NestedLoop path, which took a long 
long time to run).
If enable_hashjoin and enable_nestloop disable generating these two 
paths. Then enable_nestloop should do the same thing, but it doesn't.

> AFAIK the reason why some paths are actually disabled (not built at all)
> while others are only penalized by adding disable_cost is that we need
> to end up with at least one way to execute the query. So we pick a path
> that we know is possible (e.g. seqscan) and hard-disable other paths.
> But the always-possible path is only soft-disabled by disable_cost.
> 
> For joins, we do the same thing. The hash/merge joins may not be
> possible, because the data types may not have hash/sort operators, etc.
> Nestloop is always possible. So we soft-disable nestloop but
> hard-disable hash/merge joins.
> 
> I doubt we want to change this behavior, unless there's a good reason to
> do that ...
It doesn't have to change. Because selecting NestedLoop doesn't really 
get stuck either. It just takes too long to run.

I will change the patch status to Withdrawn.
> 
> 
> regards
> 




Re: Why enable_hashjoin Completely disables HashJoin

From
Robert Haas
Date:
On Tue, Apr 4, 2023 at 3:38 AM Quan Zongliang <quanzongliang@yeah.net> wrote:
> Because someone noticed that when he set enable_hashjoin,
> enable_mergejoin and enable_nestloop to off. The statement seemed to get
> stuck (actually because it chose the NestedLoop path, which took a long
> long time to run).
> If enable_hashjoin and enable_nestloop disable generating these two
> paths. Then enable_nestloop should do the same thing, but it doesn't.

This all seems like expected behavior. If you disable an entire plan
type, you should expect to get some bad plans. And if you disable all
the plan types, you should still expect to get some plan, but maybe an
extremely bad one.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Why enable_hashjoin Completely disables HashJoin

From
Greg Stark
Date:
On Mon, 3 Apr 2023 at 19:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Or we could rethink the design goal of not allowing enable_foo switches
> to cause us to fail to make a plan.  That might be unusable though.

Off the top of my head I don't see why. It's not like the possible
plans are going to change on you often, only when DDL changes the
schema.

The only one that gives me pause is enable_seqscan. I've seen multiple
sites that turn it off as a hammer to force OLTP-style plans. They
still get sequential scans where they're absolutely necessary such as
small reference tables with no usable index and rely on that
behaviour.

In that case we would ideally generate a realistic cost estimate for
the unavoidable sequential scan to avoid twisting the rest of the plan
in strange ways.

But perhaps these sites would be better served with different
machinery anyways. If they actually did get a sequential scan on a
large table or any query where the estimate was very high where they
were expecting low latency OLTP queries perhaps they would prefer to
get an error than some weird plan anyways.

And for query planning debugging purposes of course it would be more
powerful to be able to enable/disable plan types per-node. That would
avoid the problem of not being able to effectively test a plan without
a sequential scan on one table when another table still needs it. But
that direction...

-- 
greg



Re: Why enable_hashjoin Completely disables HashJoin

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> On Mon, 3 Apr 2023 at 19:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Or we could rethink the design goal of not allowing enable_foo switches
>> to cause us to fail to make a plan.  That might be unusable though.

> The only one that gives me pause is enable_seqscan. I've seen multiple
> sites that turn it off as a hammer to force OLTP-style plans.

Yeah, that.  There are definitely people using some of these switches
in production, hence relying on the current (and documented) behavior.
On the whole I doubt we can get away with that answer.

> In that case we would ideally generate a realistic cost estimate for
> the unavoidable sequential scan to avoid twisting the rest of the plan
> in strange ways.

As I mentioned earlier, I think it might be possible to hack up the
seqscan case to avoid use of disable_cost pretty easily.  It's far
easier to detect that no other plans are possible than it is once
you get to the join stage.  Perhaps that's worth doing.

            regards, tom lane