Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Use of additional index columns in rows filtering
Date
Msg-id CAH2-WzkbRk+C_2pthU0vu7YSUY1=Dwgwp0BWZWRvraQTjY7Rng@mail.gmail.com
Whole thread Raw
In response to Re: Use of additional index columns in rows filtering  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Thu, Aug 3, 2023 at 2:46 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Sure, having more choices means a risk of making mistakes. But does
> simply postponing the choices to runtime actually solves this?

It solves that one problem, yes. This is particularly important in
cases where we would otherwise get truly pathological performance --
not just mediocre or bad performance. Most of the time, mediocre
performance isn't such a big deal.

Having a uniform execution strategy for certain kinds of index scans
is literally guaranteed to beat a static strategy in some cases. For
example, with some SAOP scans (with my patch), we'll have to skip lots
of the index, and then scan lots of the index -- just because of a
bunch of idiosyncratic details that are almost impossible to predict
using statistics. Such an index scan really shouldn't be considered
"moderately skippy". It is not the average of two opposite things --
it is more like two separate things that are opposites.

It's true that even this "moderately skippy" case needs to be costed.
But if we can entirely eliminate variation that really looks like
noise, it should be more likely that we'll get the cheapest plan.
Costing may not be any easier, but getting the cheapest plan might be.

> > 1. An "access predicate" is always strictly better than an equivalent
> > "index filter predicate" (for any definition of "index filter
> > predicate" you can think of).
>
> Yes, probably.
>
> > 2. An "Index Filter: " is always strictly better than an equivalent
> > "Filter: " (i.e. table filter).
>
> Not sure about this. As I explained earlier, I think it needs to
> consider the cost/selectivity of the predicate, and fraction of
> allvisible pages. But yes, it's a static decision.

What I said is limited to "equivalent" predicates. If it's just not
possible to get an "access predicate" at all, then my point 1 doesn't
apply. Similarly, if it just isn't possible to get an "Index Filter"
(only a table filter), then my point #2 doesn't apply.

This does mean that there could still be competition between multiple
index paths for the same composite index, but I have no objections to
that -- it makes sense to me because it isn't duplicative in the way
that I'm concerned about. It just isn't possible to delay anything
until run time in this scenario, so nothing that I've said should
apply.

> (I didn't say it explicitly, but this assumes those paths are not for
> the same index. If they were, then PATH #3 would have to exist too.)

That changes everything, then. If they're completely different indexes
then nothing I've said should apply. I can't think of a way to avoid
making an up-front commitment to that in the planner (I'm thinking of
far more basic things than that).

> I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

Not from my point of view. Besides, even when somebody says that they
just don't understand what I'm saying at all (which wasn't ever fully
the case here), that is often useful feedback in itself.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Using defines for protocol characters
Next
From: Nathan Bossart
Date:
Subject: Re: Adding argument names to aggregate functions