Re: On disable_cost - Mailing list pgsql-hackers

From Robert Haas
Subject Re: On disable_cost
Date
Msg-id CA+TgmoYUQq3C8Uo-YdDapo2vv84+XqsVi3J2R1RFGC0rf-bkgA@mail.gmail.com
Whole thread Raw
In response to Re: On disable_cost  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, May 6, 2024 at 4:30 PM Peter Geoghegan <pg@bowt.ie> wrote:
> FWIW I always found those weird inconsistencies to be annoying at
> best, and confusing at worst. I speak as somebody that uses
> disable_cost a lot.
>
> I certainly wouldn't ask anybody to make it a priority for that reason
> alone -- it's not *that* bad. I've given my opinion on this because
> it's already under discussion.

Thanks, it's good to have other perspectives.

Here are some patches for discussion.

0001 gets rid of disable_cost as a mechanism for forcing a TID scan
plan to be chosen when CurrentOfExpr is present. Instead, it arranges
to generate only the valid path when that case occurs, and skip
everything else. I think this is a good cleanup, and it doesn't seem
totally impossible that it actually prevents a failure in some extreme
case.

0002 cleans up the behavior of enable_indexscan and
enable_indexonlyscan. Currently, setting enable_indexscan=false adds
disable_cost to both the cost of index scans and the cost of
index-only scans. I think that's indefensible and, in fact, a bug,
although I believe David Rowley disagrees. With this patch, we simply
don't generate index scans if enable_indexscan=false, and we don't
generate index-only scans if enable_indexonlyscan=false, which seems a
lot more consistent to me. However, I did revise one major thing from
the patch I posted before, per feedback from David Rowley and also per
my own observations: in this version, if enable_indexscan=true and
enable_indexonlyscan=false, we'll generate index-scan paths for any
cases where, with both set to true, we would have only generated
index-only scan paths. That change makes the behavior of this patch a
lot more comprehensible and intuitive: the only regression test
changes are places where somebody expected that they could disable
both index scans and index-only scans by setting
enable_indexscan=false.

0003 and 0004 extend the approach of "just don't generate the disabled
path" to bitmap scans and gather merge, respectively. I think these
are more debatable, mostly because it's not clear how far we can
really take this approach. Neither breaks any test cases, and 0003 is
closely related to the work done in 0002, which seems like a point in
its favor. 0004 was simply the only other case where it was obvious to
me that this kind of approach made sense. In my view, it makes most
sense to use this kind of approach for planner behaviors that seem
like they're sort of optional: like if you don't use gather merge, you
can still use gather, and if you don't use index scans, you can still
use sequential scans. With all these patches applied, the remaining
cases where we rely on disable_cost are:

sequential scans
sorts
hash aggregation
all 3 join types
hash joins where a bucket holding the inner MCV would exceed hash_mem

Sequential scans are clearly a last-ditch method. I find it a bit hard
to decide whether hashing or sorting is the default, especially giving
the asymmetry between enable_sort - presumptively anywhere - and
enable_hashagg - specific to aggregation. As for the join types, it's
tempting to consider nested-loop the default type -- it's the only way
to satisfy parameterizations, for instance -- but the fact that it's
the only method that can't do a full join undermines that position in
my book. But, I don't want to pretend like I have all the answers
here, either; I'm just sharing some thoughts.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: pg_sequence_last_value() for unlogged sequences on standbys
Next
From: Michail Nikolaev
Date:
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements