Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Use of additional index columns in rows filtering |
Date | |
Msg-id | 9b792bdb-ccfb-a07d-beb3-39375d551790@enterprisedb.com Whole thread Raw |
In response to | Re: Use of additional index columns in rows filtering (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Use of additional index columns in rows filtering
|
List | pgsql-hackers |
On 8/4/23 02:08, Peter Geoghegan wrote: > On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> When you say "index page accesses" do you mean accesses to index pages, >> or accesses to heap pages from the index scan? > > Yes, that's exactly what I mean. Note that that's the dominant cost > for the original BitmapOr plan. > Well, I presented multiple options, so "yes" doesn't really clarify which of them applies. But my understanding is you meant the index pages accesses. > As I said upthread, the original BitmapOr plan has 7 buffer hits. The > breakdown is 1 single heap page access, 3 root page accesses, and 3 > leaf page accesses. There is only 1 heap page access because only 1 > out of the 3 index scans that feed into the BitmapOr actually end up > finding any matching rows in the index. > > In short, the dominant cost here is index page accesses. It's a > particularly good case for my SAOP patch! > Understood. It makes sense considering the SAOP patch is all about optimizing the index walk / processing fewer pages. >> Because my patch is all about reducing the heap pages, which are usually >> the expensive part of the index scan. But you're right the "index scan" >> with index filter may access more index pages, because it has fewer >> "access predicates". > > The fact is that your patch correctly picks the cheapest plan, which > is kinda like a risky version of the plan that my SAOP patch would > pick -- it is cheaper for the very same reason. I understand that > that's not your intention at all, but this is clearly what happened. > That's what I meant by "weird second order effects". > > To me, it really does kinda look like your patch accidentally > discovered a plan that's fairly similar to the plan that my SAOP patch > would have found by design! Perhaps I should have been clearer on this > point earlier. (If you're only now seeing this for yourself for the > first time, then...oops. No wonder you were confused about which > patch it was I was going on about!) > Thanks. I think I now see the relationship between the plan with my patch and your SAOP patch. It's effectively very similar, except that the responsibilities are split a bit differently. With my patch the OR clause happens outside AM, while the SAOP patch would do that in the AM and also use that to walk the index more efficiently. >> I don't quite see that with the tenk1 query we've been discussing (the >> extra buffers were due to non-allvisible heap pages), but I guess that's >> possible. > > The extra buffer hits occur because I made them occur by inserting new > tuples where thousand = 42. Obviously, I did it that way because I had > a point that I wanted to make. Obviously, there wouldn't have been any > notable regression from your patch at all if I had (say) inserted > tuples where thousand = 43 instead. (Not for the original "42" query, > at least.) > Right. I know where the heap accesses come from, but clearly we're able to skip those when allvisible=true, and we don't need to scan more index pages either. But I guess we could make the data more complex to make this part worse (for my patch, while the SAOP would not be affected). > That's part of the problem, as I see it. Local changes like that can > have outsized impact on individual queries, even though there is no > inherent reason to expect it. How can statistics reliably guide the > planner here? Statistics are supposed to be a summary of the whole > attribute, that allow us to make various generalizations during > planning. But this plan leaves us sensitive to relatively small > changes in one particular "thousand" grouping, with potentially > outsized impact. And, this can happen very suddenly, because it's so > "local". > > Making this plan perform robustly just doesn't seem to be one of the > things that statistics can be expected to help us with very much. > I agree there certainly are cases where the estimates will be off. This is not that different from correlated columns, in fact it's exactly the same issue, I think. But it's also not a novel/unique issue - we should probably do the "usual" thing, i.e. plan based on estimates (maybe with some safety margin), and have some fallback strategy at runtime. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: