Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use of additional index columns in rows filtering
Date
Msg-id b6c01583-a7e6-247c-2369-524d284b72ac@enterprisedb.com
Whole thread Raw
In response to Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Use of additional index columns in rows filtering
List pgsql-hackers

On 8/3/23 20:50, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>> I understand that that's how the patch is structured. It is
>>> nevertheless possible (as things stand) that the patch will make the
>>> planner shift from a plan that uses "Access Predicates" to the maximum
>>> extent possible when scanning a composite index, to a similar plan
>>> that has a similar index scan, for the same index, but with fewer
>>> "Access Predicates" in total. In effect, the patched planner will swap
>>> one type of predicate for another type because doing so enables the
>>> executor to scan an index once instead of scanning it several times.
>>>
>>
>> That seems very much like something the costing is meant to handle, no?
>> I mean, surely "access predicate" and "filter" should affect the cost
>> differently, with "filter" being more expensive (and table filter being
>> more expensive than index filter).
> 
> I'm not 100% sure that it's not a costing issue, but intuitively it
> doesn't seem like one.
> 
> As Goetz Graefe once said, "choice is confusion". It seems desirable
> to have fewer, better index paths. This is possible whenever there is
> a way to avoid the index paths that couldn't possibly be better in the
> first place. Though we must also make sure that there is no real
> downside -- possibly by teaching the executor to behave adaptively
> instead of needlessly trusting what the planner says. Turning a plan
> time decision into a runtime decision seems strictly better.
> 

Sure, having more choices means a risk of making mistakes. But does
simply postponing the choices to runtime actually solves this? Even if
you're able to do a perfect choice at that point, it's only works for
that particular path type (e.g. index scan). You still need to cost it
somehow, to decide which path type to pick ...

Perhaps my mental model of what you intend to do is wrong?

> Obviously the planner will always need to be trusted to a significant
> degree (especially for basic things like join order), but why not
> avoid it when we can avoid it without any real downsides? Having lots
> of slightly different index paths with slightly different types of
> logically equivalent predicates seems highly undesirable, and quite
> avoidable.
> 
> ISTM that it should be possible to avoid generating some of these
> index paths based on static rules that assume that:
> 
> 1. An "access predicate" is always strictly better than an equivalent
> "index filter predicate" (for any definition of "index filter
> predicate" you can think of).

Yes, probably.

> 2. An "Index Filter: " is always strictly better than an equivalent
> "Filter: " (i.e. table filter).

Not sure about this. As I explained earlier, I think it needs to
consider the cost/selectivity of the predicate, and fraction of
allvisible pages. But yes, it's a static decision.

> 
> The first item is what I've been going on about, of course. The second
> item is the important principle behind your patch -- and one that I
> also agree with. I don't see any contradictions here -- these two
> principles are compatible. I think that we can have it both ways.
> 
>> AFAICS the assumption is that path #1 should be better, as it has two
>> proper access predicates. But maybe if we add another condition C, it
>> might end up like this:
>>
>>   PATH #1: access predicates (A,B), table filter C
>>   PATH #2: access predicate A, index filter (B,C)
>>
>> and #2 will end up winning.
> 
> Why wouldn't we expect there to also be this path:
> 
> PATH #3: access predicates (A,B), index filter C
> 

Why? Maybe the index doesn't have all the columns needed for condition
C, in which case it has to be evaluated as a table filter.

(I didn't say it explicitly, but this assumes those paths are not for
the same index. If they were, then PATH #3 would have to exist too.)

> And why wouldn't we also expect this other path to always be better?
> So much better that we don't even need to bother generating PATH #1
> and PATH #2 in the first place, even?
> 

Yes, I agree that path would likely be superior to the other paths.

> Right now there are weird reasons why it might not be so -- strange
> interactions with things like BitmapOr nodes that could make either
> PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
> particularly fundamental to me. We should probably just avoid those
> plan shapes that have the potential to make PATH #1 and PATH #2
> slightly cheaper, due only to perverse interactions.
> 
>> I like the discussion, but it feels a bit abstract (and distant from
>> what the patch aimed to do) and I have trouble turning it into something
>> actionable.
> 
> I think that I have gotten a lot out of this discussion -- it has made
> my thinking about this stuff more rigorous. I really appreciate that.
> 

I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

>> Does this apply to the index scan vs. index-only scans too? That is, do
>> you think we should we have just one index-scan path, doing index-only
>> stuff when possible?
> 
> I think so, yes. But index-only scans don't appear under BitmapOr
> nodes, so right now I can't think of an obvious way of demonstrating
> that this is true. Maybe it accidentally doesn't come up with
> index-only scans in practice, but the same underlying principles
> should be just as true.
> 
>> If we can form some sort of plan what needs to be done (both for my
>> patch and for the SAOP patch), I'm willing to work on it ... But it's
>> not quite clear to me what the requirements are.
> 
> I do hope to have more concrete proposals soon. Thanks for being patient.
> 
> For what it's worth, I actually think that there is a good chance that
> I'll end up relying on what you've done here to make certain things I
> want to do with the SOAP patch okay. It would be rather convenient to
> be able to handle some of the SAOP safety issues without needing any
> table filters (just index filters), in some corner cases. I think that
> what you're doing here makes a lot of sense. FWIW, I am already
> personally invested in the success of your patch.
> 

Great! I'm happy those bits are likely useful for what you're doing.

regards


[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: Using defines for protocol characters
Next
From: Tomas Vondra
Date:
Subject: Re: Use of additional index columns in rows filtering