Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use of additional index columns in rows filtering
Date
Msg-id bf678c8f-2094-a52b-357e-7cc7c7813b41@enterprisedb.com
Whole thread Raw
In response to Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Use of additional index columns in rows filtering
Re: Use of additional index columns in rows filtering
List pgsql-hackers

On 8/3/23 21:21, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Not sure. I'm a bit confused about what exactly is so risky on the plan
>> produced with the patch.
> 
> It's all about the worst case. In the scenarios that I'm concerned
> about, we can be quite sure that the saving from not using a BitmapOr
> will be fairly low -- the cost of not having to repeat the same index
> page accesses across several similar index scans is, at best, some
> small multiple of the would-be number of index scans that the BitmapOr
> plan gets. We can be certain that the possible benefits are fixed and
> low. This is always true; presumably the would-be BitmapOr plan can
> never have all that many index scans. And we always know how many
> index scans a BitmapOr plan would use up front.
> 

When you say "index page accesses" do you mean accesses to index pages,
or accesses to heap pages from the index scan?

Because my patch is all about reducing the heap pages, which are usually
the expensive part of the index scan. But you're right the "index scan"
with index filter may access more index pages, because it has fewer
"access predicates".

I don't quite see that with the tenk1 query we've been discussing (the
extra buffers were due to non-allvisible heap pages), but I guess that's
possible.

> On the other hand, the possible downsides have no obvious limit. So
> even if we're almost certain to win on average, we only have to be
> unlucky once to lose all we gained before that point. As a general
> rule, we want the index AM to have all the context required to
> terminate its scan at the earliest possible opportunity. This is
> enormously important in the worst case.
> 

Yeah, I agree there's some asymmetry in the risk/benefit. It's not
unlike e.g. seqscan vs. index scan, where the index scan can't really
save more than what the seqscan costs, but it can get (almost)
arbitrarily expensive.

> It's easier for me to make this argument because I know that we don't
> really need to make any trade-off here. But even if that wasn't the
> case, I'd probably arrive at the same general conclusion.
> 
> Importantly, it isn't possible to make a similar argument that works
> in the opposite direction -- IMV that's the difference between this
> flavor of riskiness, and the inevitable riskiness that comes with any
> planner change. In other words, your patch isn't going to win by an
> unpredictably high amount. Not in the specific scenarios that I'm
> focussed on here, with a BitmapOr + multiple index scans getting
> displaced.
> 

True. It probably can't beat BitmapOr plan if it means moving access
predicate into index filter (or even worse a table filter).

> The certainty about the upside is just as important as the uncertainty
> about the downside. The huge asymmetry matters, and is fairly
> atypical. If, somehow, there was less certainty about the possible
> upside, then my argument wouldn't really work.
> 


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Use of additional index columns in rows filtering
Next
From: Tatsuo Ishii
Date:
Subject: Re: Using defines for protocol characters