Re: On disable_cost - Mailing list pgsql-hackers

From Robert Haas
Subject Re: On disable_cost
Date
Msg-id CA+TgmoYiWfNq-oByseKphKn+GWZhL55PwOtSZ87NBE1CVYX7GQ@mail.gmail.com
Whole thread Raw
In response to Re: On disable_cost  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: On disable_cost
List pgsql-hackers
Thanks for the review!

On Tue, Jul 2, 2024 at 10:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> v3-0001-Remove-grotty-use-of-disable_cost-for-TID-scan-pl.patch:
>
> +1, this seems ready for commit

Cool.

> v3-0002-Rationalize-behavior-of-enable_indexscan-and-enab.patch:
>
> I fear this will break people's applications, if they are currently
> forcing a sequential scan with "set enable_indexscan=off". Now they will
> need to do "set enable_indexscan=off; set enable_indexonlyscan=off" for
> the same effect. Maybe it's acceptable, disabling sequential scans to
> force an index scan is much more common than the other way round.

Well, I think it's pretty important that the GUC does what the name
and documentation say it does. One could of course argue that we ought
not to have two different GUCs -- or perhaps even that we ought not to
have two different plan nodes -- and I think those arguments might be
quite defensible. One could also argue for another interface, like a
GUC enable_indexscan and a value that is a comma-separated list
consisting of plain, bitmap, and index-only, or
none/0/false/any/1/true -- and that might also be quite defensible.
But I don't think one can have a GUC called enable_indexscan and
another GUC called enable_indexonlyscan and argue that it's OK for the
former one to affect both kinds of scans. That's extremely confusing
and, well, just plain wrong. I think this is a bug, and I'm not going
to back-patch the fix precisely because of the considerations you
note, but I really don't think we can leave it like this. The current
behavior is so nonsensical that the code is essentially unmaintable,
or at least I think it is.

> v3-0003-Treat-number-of-disabled-nodes-in-a-path-as-a-sep.patch:
>
> > @@ -1318,6 +1342,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
> >       startup_cost += path->pathtarget->cost.startup;
> >       run_cost += path->pathtarget->cost.per_tuple * path->rows;
> >
> > +     /*
> > +      * There are assertions above verifying that we only reach this function
> > +      * either when enable_tidscan=true or when the TID scan is the only legal
> > +      * path, so it's safe to set disabled_nodes to zero here.
> > +      */
> > +     path->disabled_nodes = 0;
> >       path->startup_cost = startup_cost;
> >       path->total_cost = startup_cost + run_cost;
> >  }
>
> So if you have enable_tidscan=off, and have a query with "WHERE CURRENT
> OF foo" that is planned with a TID scan, we set disable_nodes = 0? That
> sounds wrong, shouldn't disable_nodes be 1 in that case? It probably
> cannot affect the rest of the plan, given that "WHERE CURRENT OF" is
> only valid in an UPDATE or DELETE, but still. At least it deserves a
> better explanation in the comment.

So, right now, when the planner disregards enable_WHATEVER because it
thinks it's the only way to implement something, it doesn't add
disable_cost. So, I made the patch not increment disabled_nodes in
that case. Maybe we want to rethink that choice at some point, but it
doesn't seem like a good idea to do it right now. I've found while
working on this stuff that it's super-easy to have seemingly innocuous
changes disturb regression test results, and I don't really want to
have a bunch of extra regression test changes that are due to
rethinking things other than disable_cost -> disabled_nodes. So for
now I'd like to increment disabled_nodes in just the cases where we
currently add disable_cost.

> left over from debugging?

Yeah, will fix.

> Is "unlikely()" really appropriate here (and elsewhere in the patch)? If
> you run with enable_seqscan=off but have no indexes, you could take that
> path pretty often.

That's true, but I think it's right to assume that's the uncommon
case. If we speed up planning for people who disabled sequential scans
and slow it down for people running with a normal planner
configuration, no one will thank us.

> If this function needs optimizing, I'd suggest splitting it into two
> functions, one for comparing the startup cost and another for the total
> cost. Almost all callers pass a constant for that argument, so they
> might as well call the correct function directly and avoid the branch
> for that.

That's not a bad idea but seems like a separate patch.

> The "Cost comparisons here should match compare_path_costs_fuzzily"
> comment also applies to the check on total_cost that you moved up. Maybe
> move up the comment to the beginning of the loop.

Will have a look.

> v3-0004-Show-number-of-disabled-nodes-in-EXPLAIN-ANALYZE-.patch:
>
> It's surprising that the "Disable Nodes" is printed even with the COSTS
> OFF option. It's handy for our regression tests, it's good to print them
> there, but it feels wrong.

I'm open to doing what people think is best here. Although we're
regarding them as part of the cost for purposes of how to compare
paths, they're not unpredictable in the way that costs are, so I think
the current handling is defensible and, as you say, it's useful for
the regression tests. However, I'm not going to fight tooth and nail
if people really want it the other way.

> Could we cram it into the "cost=... rows=..." part? And perhaps a marker
> that a node was disabled would be more user friendly than showing the
> cumulative count? Something like:

The problem is that we'd have to derive that. What we actually know is
the disable count; to figure out whether the node itself was disabled,
we'd have to subtract the value for the underlying nodes back out.
That seems like it might be buggy or confusing.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: What is a typical precision of gettimeofday()?
Next
From: Hannu Krosing
Date:
Subject: Re: What is a typical precision of gettimeofday()?