Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Use of additional index columns in rows filtering |
Date | |
Msg-id | c000f024-5ee2-609e-e06b-e521316d2ac6@enterprisedb.com Whole thread Raw |
In response to | Re: Use of additional index columns in rows filtering (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Use of additional index columns in rows filtering
|
List | pgsql-hackers |
On 6/21/23 18:17, James Coleman wrote: > On Wed, Jun 21, 2023 at 11:28 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> >> >> On 6/21/23 14:45, James Coleman wrote: >>> Hello, >>> >>> I've cc'd Jeff Davis on this due to a conversation we had at PGCon >>> about applying filters on index tuples during index scans. >>> >>> I've also cc'd Andres Freund because I think this relates to his >>> musing in [1] that: >>>> One thing I have been wondering around this is whether we should not have >>>> split the code for IOS and plain indexscans... >>> >>> I think I remember Peter Geoghegan also wondering (I can't remember if >>> this was in conversation at PGCon about index skip scans or in a >>> hackers thread) about how we compose these various index scan >>> optimizations. >>> >>> To be certain this is probably a thing to tackle as a follow-on to >>> this patch, but it does seem to me that what we are implicitly >>> realizing is that (unlike with bitmap scans, I think) it doesn't >>> really make a lot of conceptual sense to have index only scans be a >>> separate node from index scans. Instead it's likely better to consider >>> it an optimization to index scans that can dynamically kick in when >>> it's able to be of use. That would allow it to compose with e.g. >>> prefetching in the aforelinked thread. At the very least we would need >>> pragmatic (e.g., cost of dynamically applying optimizations) rather >>> than conceptual reasons to argue they should continue to be separate. >>> >> >> I agree it seems a bit weird to have IOS as a separate node. In a way, I >> think there are two dimensions for "index-only" scans - which pages can >> be scanned like that, and which clauses can be evaluated with only the >> index tuple. The current approach focuses on page visibility, but >> ignores the other aspect entirely. Or more precisely, it disables IOS >> entirely as soon as there's a single condition requiring heap tuple. >> >> I agree it's probably better to see this as a single node with various >> optimizations that can be applied when possible / efficient (based on >> planner and/or dynamically). >> >> I'm not sure I see a direct link to the prefetching patch, but it's true >> that needs to deal with tids (instead of slots), just like IOS. So if >> the node worked with tids, maybe the prefetching could be done at that >> level (which I now realize may be what Andres meant by doing prefetching >> in the executor). > > The link to prefetching is that IOS (as a separate node) won't benefit > from prefetching (I think) with your current prefetching patch (in the > case where the VM doesn't allow us to just use the index tuple), > whereas if the nodes were combined that would more naturally be > composable. > Yeah, mostly. Although just unifying "regular" indexscans and IOS would not allow prefetching for IOS. The reason why the current design does not allow doing prefetching for IOS is that the prefetching happens deep in indexam.c, and down there we don't know which TIDs are not from all-visible pages and would need prefetching. Which is another good reason to do the prefetching at the executor level, I believe. >>> Apologies for that lengthy preamble; on to the patch under discussion: >>> >>> On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Hi, >>>> >>>> I took a stab at this and implemented the trick with the VM - during >>>> index scan, we also extract the filters that only need the indexed >>>> attributes (just like in IOS). And then, during the execution we: >>>> >>>> 1) scan the index using the scan keys (as before) >>>> >>>> 2) if the heap page is all-visible, we check the new filters that can >>>> be evaluated on the index tuple >>>> >>>> 3) fetch the heap tuple and evaluate the filters >>> >>> Thanks for working on this; I'm excited about this class of work >>> (along with index prefetching and other ideas I think there's a lot of >>> potential for improving index scans). >>> >>>> This is pretty much exactly the same thing we do for IOS, so I don't see >>>> why this would be incorrect while IOS is correct. >>>> >>>> This also adds "Index Filter" to explain output, to show which filters >>>> are executed on the index tuple (at the moment the filters are a subset >>>> of "Filter"), so if the index tuple matches we'll execute them again on >>>> the heap tuple. I guess that could be fixed by having two "filter" >>>> lists, depending on whether we were able to evaluate the index filters. >>> >>> Given that we show index filters and heap filters separately it seems >>> like we might want to maintain separate instrumentation counts of how >>> many tuple were filtered by each set of filters. >>> >> >> Yeah, separate instrumentation counters would be useful. What I was >> talking about was more about the conditions itself, because right now we >> re-evaluate the index-only clauses on the heap tuple. >> >> Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2). >> the patch splits this into two lists: >> >> index-only clauses: (a=1) >> clauses: (a=1) AND (b=1) >> >> So we evaluate (a=1) first, and then we fetch the heap tuple and check >> "clauses" again, which however includes the (a=1) again. For cheap >> clauses (or when (a=1) eliminated a lot of tuples using just the index), >> but for expensive clauses it might hurt. >> >> It's fixable, we'd just need to keep two versions of the "clauses" list, >> one for IOS mode (when index-only clauses were checked) and a complete >> one when we need to check all clauses. > > In some cases (where the VM doesn't allow us to use just the index > tuple) we'd have to execute both lists against the heap tuple, right? > Not quite. I suspect you imagine we'd have two lists L1: (a=1) L2: (b=1) and that for all-visible pages we'd check L1 on index, and then maybe L2 on heap. And for non-all-visible pages we'd check both on heap. But that doesn't work, because L1 has references to attnums in the index tuple, while L2 has attnums to heap. So we'd need L1: (a=1) -> against index tuple L2: (b=1) -> against heap tuple L3: (a=1) AND (b=1) -> against heap tuple And for non-all-visible pages we'd only use L3. (I wonder if we could check if the tuple is visible and then go back and check L1 on index tuple, but I doubt it'd be really more efficient.) >>>> Most of the patch is pretty mechanical - particularly the planning part >>>> is about identifying filters that can be evaluated on the index tuple, >>>> and that code was mostly shamelessly copied from index-only scan. >>>> >>>> The matching of filters to index is done in check_index_filter(), and >>>> it's simpler than match_clause_to_indexcol() as it does not need to >>>> consider operators etc. (I think). But maybe it should be careful about >>>> other things, not sure. >>> >>> This would end up requiring some refactoring of the existing index >>> matching code (or alternative caching on IndexOptInfo), but >>> match_filter_to_index() calling check_index_filter() results in >>> constructs a bitmapset of index columns for every possible filter >>> which seems wasteful (I recognize this is a bit of a proof-of-concept >>> level v1). >>> >> >> Probably, I'm sure there's a couple other places where the current API >> was a bit cumbersome and we could optimize. >> >>>> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned >>>> earlier, the idea is to check VM and evaluate the filters on the index >>>> tuple if possible, similar to index-only scans. Except that we then have >>>> to fetch the heap tuple. Unfortunately, this means the code can't use >>>> index_getnext_slot() anymore. Perhaps we should invent a new variant >>>> that'd allow evaluating the index filters in between. >>> >>> It does seem there are some refactoring opportunities there. >>> >> >> Actually, I realized maybe we should switch this to index_getnext_tid() >> because of the prefetching patch. That would allow us to introduce a >> "buffer" of TIDs, populated by the index_getnext_tid(), and then do >> prefetching based on that. It's similar to what bitmap scans do, except >> that intead of the tbm iterator we get items from index_getnext_tid(). >> >> I haven't tried implementing this yet, but I kinda like the idea as it >> works no matter what exactly the AM does (i.e. it'd work even for cases >> like GiST with distance searches). > > Oh, interesting, I'll let you keep chewing on that then. > Cool! >>>> With the patch applied, the query plan changes from: >>>> >>>> ... >>>> >>>> to >>>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> Limit (cost=0.42..3662.15 rows=1 width=12) >>>> (actual time=13.663..13.667 rows=0 loops=1) >>>> Buffers: shared hit=544 >>>> -> Index Scan using t_a_include_b on t >>>> (cost=0.42..3662.15 rows=1 width=12) >>>> (actual time=13.659..13.660 rows=0 loops=1) >>>> Index Cond: (a > 1000000) >>>> Index Filter: (b = 4) >>>> Rows Removed by Index Recheck: 197780 >>>> Filter: (b = 4) >>>> Buffers: shared hit=544 >>>> Planning Time: 0.105 ms >>>> Execution Time: 13.690 ms >>>> (10 rows) >>>> >>>> ... >>> >>> I did also confirm that this properly identifies cases Jeff had >>> mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10) >>> = 4))". >>> >> >> Good! >> >>> I noticed also you still had questions/TODOs about handling index >>> scans for join clauses. >>> >> >> Not sure which questions/TODOs you refer to, but I don't recall any >> issues with join clauses. But maybe I just forgot. > > I was referring to the comment: > > + * FIXME Maybe this should fill the filterset too? > > above match_eclass_clauses_to_index()'s definition. > Ah, yeah. I'm sure there are some loose ends in the matching code. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: