Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) |
Date | |
Msg-id | CAH2-Wzk-NiFFVc=XayoqO4ymSbmJm121H370X8xPqCBYNQYyKw@mail.gmail.com Whole thread Raw |
In response to | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Thu, Feb 27, 2025 at 3:42 PM Robert Haas <robertmhaas@gmail.com> wrote: > +1 for having some instrumentation. I do not agree with Tom that these > are numbers that only Peter Geoghegan and 2-3 other people will ever > understand. I grant that it's not going to make sense to everyone, but > the number of people to which it will make sense I would guess is > probably in the hundreds or the thousands rather than the single > digits. I agree that it's likely to be of interest to only a minority of users. But a fairly large minority. > Good documentation could help. It's easy to produce an example that makes intuitive sense. For example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and 5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index Searches: 5" -- one search per "a" value. Such an example might be more useful than my original pgbench_accounts example. Do you think that that would help? > So, where should we store that information? > > The thing that's odd about using IndexScanDesc is that it doesn't > contain any other instrumentation currently, or at least not that I > can see. It is unique right now, but perhaps only because this is the first piece of instrumentation that: A. We must track at the level of an individual index scan -- not at the level of an index relation, like pgstat_count_index_scan(), nor at the level of the whole system, like BufferUsage state. AND B. Requires that we count something that fundamentally lives inside the index AM -- something that we cannot reasonably track/infer from the executor proper (more on that below, in my response to your scheme #1). > First, that struct has just four counters (ntuples, > ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of > them are in use -- a plain index scan does not use ntuples2 -- but I > think this optimization can apply to both index and index-only scans, > so we just don't have room. Right, index-only scans have exactly the same requirements as plain index scans/bitmap index scans. > 1. Add a new field to struct Instrumentation. Make > index_getnext_slot() and possibly other similar functions return a > count of index searches via a new parameter, and use that to bump the > new struct Instrumentation counter. It's a little ugly to have to add > a special-purpose parameter for this, but it doesn't look like there > are tons and tons of calls so maybe it's OK. That seems like the strongest possible alternative to the original scheme used in the current draft patch (scheme #3). This scheme #1 has the same issue as scheme #3, though: it still requires an integer counter that tracks the number of index searches (something a bit simpler than that, such as a boolean flag set once per amgettuple call, won't do). This is due to there being no fixed limit on the number of index searches required during any single amgettuple call: in general the index AM may perform quite a few index searches before it is able to return the first tuple to the scan (or before it can return the next tuple). The only difference that I can see between scheme #1 and scheme #3 is that the former requires 2 counters instead of just 1. And, we'd still need to have 1 out of the 2 counters located either in IndexScanDesc itself (just like scheme #3), or located in some other struct that can at least be accessed through IndexScanDesc (like the index AM opaque state, per scheme #4). After all, *every* piece of state known to any amgettuple routine must ultimately come from IndexScanDesc (or from backend global state, per scheme #2). (Actually, I supposed it is technically possible to avoid storing anything in IndexScanDesc by inventing another amgettuple argument, just for this. That seems like a distinction without a difference, though.) > 2. Add a new field to BufferUsage. Then the AM can bump this field and > explain.c will know about it the same way it knows about other changes > to pgBufferUsage. However, this is not really about buffer usage and > the new field seems utterly unlike the other things that are in that > structure, so this seems really bad to me. I agree. The use of global variables seems quite inappropriate for something like this. It'll result in wrong output whenever an index scan uses an InitPlan in its qual when that InitPlan is itself a plan that contains an index scan (this is already an issue with "Buffers:" instrumentation, but this would be much worse). > 3. Add a new field to IndexScanDesc, as you originally proposed. This > seems similar to #1: it's still shoveling instrumentation data around > in a way that we don't currently, but instead of shoveling it through > a new parameter, we shovel it through a new structure member. Either > way, the new thing (parameter or structure member) doesn't really look > like it belongs with what's already there, so it seems like > conservation of ugliness. Perhaps a comment noting why the new counter lives in IndexScanDesc would help? > 4. Add a new field to the btree-specific structure referenced by the > IndexScanDesc's opaque pointer. I think this is what Matthias was > proposing. It doesn't seem particularly hard to implement. and seems > similar to #1 and #3. It definitely isn't hard to implement. But... > It is not clear to me that any of #1, #3, and #4 are radically better > than any of the other ones, with the following exception: it would be > a poor idea to choose #4 over #3 if this field will ultimately be used > for a bunch of different AMs, and a poor idea to choose #3 over #4 if > it's always going to be interesting only for btree. ...the requirements here really are 100% index-AM-generic. So I just don't see any advantage to scheme #4. All that I propose to do here is to display the information that is already tracked by pgstat_count_index_scan()/pg_stat_user_tables.idx_scan (at the level of each index relation) when EXPLAIN ANALYZE is run (at the level of each index scan). I'm not inventing a new concept; I'm extending an existing index-AM-generic approach. It just so happens that there is a greater practical need for that information within B-Tree scans. Admittedly, the actual amount of code the patch adds to nbtree is slightly more than it'll add to other index AMs (all of which require exactly one added line of code). But that's only because nbtree alone supports parallel index scans. Were it not for that, nbtree would also require only a single additional line of code. > I'll defer to you on which of those things is the case, but with the request that you > think about what is practically likely to happen and not advocate too > vigorously based on an argument that makes prominent use of the phrase > "in theory". Right now, the only instrumentation that lives inside index AMs is pgstat_count_index_scan(), which works at the relation level (not the scan level). Matthias may well be right that we'll eventually want to add more stuff like this. For example, maybe we'd show the number of index tuples that we evaluated against the scan's qual that were not actually returned to the executor proper. Then we really would be inventing a whole new concept -- but a concept that was also index-AM-neutral. I think that that's likely to be true generally, for all manner of possible instrumentation improvements that live inside index AMs -- so we'll likely end up putting more fields in IndexScanDesc for that stuff. > To be honest, I don't really like any of these options > very much: they all seem a tad inelegant. But sometimes that is a > necessary evil when inventing something new. I believe if I were > implementing this myself I would probably try #1 first; if that ended > up seeming too ugly, then I would fall back to #3 or #4. I do get that. I hope that you don't think that I've failed to take your feedback on board. I ended up at #3 only through yak-shaving/trial-and-error coding. Hopefully my reasoning makes sense. > Does that help? Yes, it does -- hugely. I'll need to think about it some more. The documentation definitely needs more work, at a minimum. Thanks for the review! -- Peter Geoghegan
pgsql-hackers by date: