Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Use of additional index columns in rows filtering
Date
Msg-id CAH2-Wzkh9udhHPwTK_XJvrcWwNp50FeZ_-1sr2UYCGAoH9n0Jw@mail.gmail.com
Whole thread Raw
In response to Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Use of additional index columns in rows filtering
List pgsql-hackers
On Sun, Aug 6, 2023 at 1:13 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Since you're not relying on the nbtree work at all here, really (just
> on the transformation process itself), the strategic risk that this
> adds to your project isn't too great. It's not like this ties the
> success of your patch to the success of my own patch. At most it ties
> the success of your patch to something like Alena Rybakina's
> OR-to-SAOP transformation patch, which seems manageable to me. (To be
> clear, I'm not relying on that work in the same way myself -- for my
> patch the transformation process is just a nice-to-have.)

I decided to verify my understanding by checking what would happen
when I ran the OR-heavy tenk1 regression test query against a
combination of your patch, and v7 of the OR-to-SAOP transformation
patch. (To be clear, this is without my patch.)

I found that the problem that I saw with the OR-heavy tenk1 regression
test goes away (though only when I "set or_transform_limit=0"). That
is, we'll get an index scan plan that uses a SAOP. This index scan
plan is comparable to the master branch's BitmapOr scan. In
particular, both plans get 7 buffer hits. More importantly, the new
plan is (like the master branch plan) not risky in the way I've been
going on about.

This does mean that your patch gets a *slightly* slower plan, due to
the issue of added index page accesses. Improving that should be a job
for my patch -- it's not your problem, since there is no regression.

I'm not sure if it's somehow still possible that SAOP expression
evaluation is able to "do the risky thing" in the same way that your
patch's "Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3)
OR (tenk1.tenthous = 42))" plan. But it certainly looks like it can't.
Increasingly, the problem here appears to me to be a problem of
lacking useful CNF transformations/normalization -- nothing more.
Structuring things so that we reliably use "the native representation
of ORs" via normalization seems likely to be all you really need.

We may currently be over relying on a similar process that happens
indirectly, via BitmapOr paths. I suspect that you're right to be
concerned about how this might already be affecting index-only scans.
Once we have CNF normalization/transformation in place, we will of
course continue to use some BitmapOr plans that may look a little like
the ones I'm focussed on, and some plans that use "index filters" (due
to your patch) that are also a little like that. But there's nothing
objectionable about those cases IMV (quite the opposite), since there
is no question of displacing/out-competing very similar plans that can
use index quals. (You might also find a way to avoid ever requiring
heap access/visibility checks for a subset of the "index filter" cases
where it is determined to be safe up front, but that's just a bonus.)

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Next
From: Tom Lane
Date:
Subject: Re: Sync scan & regression tests