Home > mailing lists

Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Use of additional index columns in rows filtering
Date	August 7, 2023 01:28:51
Msg-id	CAH2-Wzkh9udhHPwTK_XJvrcWwNp50FeZ_-1sr2UYCGAoH9n0Jw@mail.gmail.com Whole thread Raw
In response to	Re: Use of additional index columns in rows filtering (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Use of additional index columns in rows filtering
List	pgsql-hackers

Tree view

On Sun, Aug 6, 2023 at 1:13 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Since you're not relying on the nbtree work at all here, really (just
> on the transformation process itself), the strategic risk that this
> adds to your project isn't too great. It's not like this ties the
> success of your patch to the success of my own patch. At most it ties
> the success of your patch to something like Alena Rybakina's
> OR-to-SAOP transformation patch, which seems manageable to me. (To be
> clear, I'm not relying on that work in the same way myself -- for my
> patch the transformation process is just a nice-to-have.)

I decided to verify my understanding by checking what would happen
when I ran the OR-heavy tenk1 regression test query against a
combination of your patch, and v7 of the OR-to-SAOP transformation
patch. (To be clear, this is without my patch.)

I found that the problem that I saw with the OR-heavy tenk1 regression
test goes away (though only when I "set or_transform_limit=0"). That
is, we'll get an index scan plan that uses a SAOP. This index scan
plan is comparable to the master branch's BitmapOr scan. In
particular, both plans get 7 buffer hits. More importantly, the new
plan is (like the master branch plan) not risky in the way I've been
going on about.

This does mean that your patch gets a *slightly* slower plan, due to
the issue of added index page accesses. Improving that should be a job
for my patch -- it's not your problem, since there is no regression.

I'm not sure if it's somehow still possible that SAOP expression
evaluation is able to "do the risky thing" in the same way that your
patch's "Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3)
OR (tenk1.tenthous = 42))" plan. But it certainly looks like it can't.
Increasingly, the problem here appears to me to be a problem of
lacking useful CNF transformations/normalization -- nothing more.
Structuring things so that we reliably use "the native representation
of ORs" via normalization seems likely to be all you really need.

We may currently be over relying on a similar process that happens
indirectly, via BitmapOr paths. I suspect that you're right to be
concerned about how this might already be affecting index-only scans.
Once we have CNF normalization/transformation in place, we will of
course continue to use some BitmapOr plans that may look a little like
the ones I'm focussed on, and some plans that use "index filters" (due
to your patch) that are also a little like that. But there's nothing
objectionable about those cases IMV (quite the opposite), since there
is no question of displacing/out-competing very similar plans that can
use index quals. (You might also find a way to avoid ever requiring
heap access/visibility checks for a subset of the "index filter" cases
where it is determined to be safe up front, but that's just a bonus.)

--
Peter Geoghegan

pgsql-hackers by date:

From: Thomas Munro
Date: 07 August 2023, 01:15:36
Subject: Re: A failure in 031_recovery_conflict.pl on Debian/s390x

From: Tom Lane
Date: 07 August 2023, 01:30:49
Subject: Re: Sync scan & regression tests

Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

Previous

Next