Thread: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Thu, 15 Aug 2024 at 21:23, Peter Geoghegan <pg@bowt.ie> wrote: > > Attached patch has EXPLAIN ANALYZE display the total number of > primitive index scans for all 3 kinds of index scan node. This is > useful for index scans that happen to use SAOP arrays. It also seems > almost essential to offer this kind of instrumentation for the skip > scan patch [1]. Skip scan works by reusing all of the Postgres 17 work > (see commit 5bf748b8) to skip over irrelevant sections of a composite > index with a low cardinality leading column, so it has all the same > issues. Did you notice the patch over at [0], where additional diagnostic EXPLAIN output for btrees is being discussed, too? I'm asking, because I'm not very convinced that 'primitive scans' are a useful metric across all (or even: most) index AMs (e.g. BRIN probably never will have a 'primitive scans' metric that differs from the loop count), so maybe this would better be implemented in that framework? Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/message-id/flat/TYWPR01MB10982D24AFA7CDC273445BFF0B1DC2%40TYWPR01MB10982.jpnprd01.prod.outlook.com#9c64cf75179da8d657a5eab7c75be480
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Attached patch has EXPLAIN ANALYZE display the total number of > > primitive index scans for all 3 kinds of index scan node. This is > > useful for index scans that happen to use SAOP arrays. It also seems > > almost essential to offer this kind of instrumentation for the skip > > scan patch [1]. Skip scan works by reusing all of the Postgres 17 work > > (see commit 5bf748b8) to skip over irrelevant sections of a composite > > index with a low cardinality leading column, so it has all the same > > issues. > > Did you notice the patch over at [0], where additional diagnostic > EXPLAIN output for btrees is being discussed, too? To be clear, for those that haven't been paying attention to that other thread: that other EXPLAIN patch (the one authored by Masahiro Ikeda) surfaces information about a distinction that the skip scan patch renders obsolete. That is, the skip scan patch makes all "Non Key Filter" quals into quals that can relocate the scan to a later leaf page by starting a new primitive index scan. Technically, skip scan removes the concept that that patch calls "Non Key Filter" altogether. Note that this isn't the same thing as making that other patch obsolete. Skip scan renders the whole concept of "Non Key Filter" obsolete *in name only*. You might prefer to think of it as making that whole concept squishy. Just because we can theoretically use the leading column to skip doesn't mean we actually will. It isn't an either/or thing. We might skip during some parts of a scan, but not during other parts. It's just not clear how to handle those sorts of fuzzy distinctions right now. It does seem worth pursuing, but I see no conflict. > I'm asking, because > I'm not very convinced that 'primitive scans' are a useful metric > across all (or even: most) index AMs (e.g. BRIN probably never will > have a 'primitive scans' metric that differs from the loop count), so > maybe this would better be implemented in that framework? What do you mean by "within that framework"? They seem orthogonal? It's true that BRIN index scans will probably never show more than a single primitive index scan. I don't think that the same is true of any other index AM, though. Don't they all support SAOPs, albeit non-natively? The important question is: what do you want to do about cases like the BRIN case? Our choices are all fairly obvious choices. We can be selective, and *not* show this information when a set of heuristics indicate that it's not relevant. This is fairly straightforward to implement. Which do you prefer: overall consistency, or less verbosity? Personally I think that the consistency argument works in favor of displaying this information for every kind of index scan. That's a hopelessly subjective position, though. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Thu, 15 Aug 2024 at 23:10, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > Attached patch has EXPLAIN ANALYZE display the total number of > > > primitive index scans for all 3 kinds of index scan node. This is > > > useful for index scans that happen to use SAOP arrays. It also seems > > > almost essential to offer this kind of instrumentation for the skip > > > scan patch [1]. Skip scan works by reusing all of the Postgres 17 work > > > (see commit 5bf748b8) to skip over irrelevant sections of a composite > > > index with a low cardinality leading column, so it has all the same > > > issues. > > > > Did you notice the patch over at [0], where additional diagnostic > > EXPLAIN output for btrees is being discussed, too? > > To be clear, for those that haven't been paying attention to that > other thread: that other EXPLAIN patch (the one authored by Masahiro > Ikeda) surfaces information about a distinction that the skip scan > patch renders obsolete. That is, the skip scan patch makes all "Non > Key Filter" quals into quals that can relocate the scan to a later > leaf page by starting a new primitive index scan. Technically, skip > scan removes the concept that that patch calls "Non Key Filter" > altogether. > > Note that this isn't the same thing as making that other patch > obsolete. Skip scan renders the whole concept of "Non Key Filter" > obsolete *in name only*. You might prefer to think of it as making > that whole concept squishy. Just because we can theoretically use the > leading column to skip doesn't mean we actually will. It isn't an > either/or thing. We might skip during some parts of a scan, but not > during other parts. Yes. > It's just not clear how to handle those sorts of fuzzy distinctions > right now. It does seem worth pursuing, but I see no conflict. > > > I'm asking, because > > I'm not very convinced that 'primitive scans' are a useful metric > > across all (or even: most) index AMs (e.g. BRIN probably never will > > have a 'primitive scans' metric that differs from the loop count), so > > maybe this would better be implemented in that framework? > > What do you mean by "within that framework"? They seem orthogonal? What I meant was putting this 'primitive scans' info into the AM-specific explain callback as seen in the latest patch version. > It's true that BRIN index scans will probably never show more than a > single primitive index scan. I don't think that the same is true of > any other index AM, though. Don't they all support SAOPs, albeit > non-natively? Not always. For Bitmap Index Scan the node's functions can allow non-native SAOP support (it ORs the bitmaps), but normal indexes without SAOP support won't get SAOP-functionality from the IS/IOS node's infrastructure, it'll need to be added as Filter. > The important question is: what do you want to do about cases like the > BRIN case? Our choices are all fairly obvious choices. We can be > selective, and *not* show this information when a set of heuristics > indicate that it's not relevant. This is fairly straightforward to > implement. Which do you prefer: overall consistency, or less > verbosity? Consistency, I suppose. But adding explain attributes left and right in Index Scan's explain output when and where every index type needs them doesn't scale, so I'd put index-specific output into it's own system (see the linked thread for more rationale). And, in this case, the use case seems quite index-specific, at least for IS/IOS nodes. > Personally I think that the consistency argument works in favor of > displaying this information for every kind of index scan. Agreed, assuming "this information" is indeed shared (and useful) across all AMs. This made me notice that you add a new metric that should generally be exactly the same as pg_stat_all_indexes.idx_scan (you mention the same). Can't you pull that data, instead of inventing a new place every AMs needs to touch for it's metrics? Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > > I'm asking, because > > > I'm not very convinced that 'primitive scans' are a useful metric > > > across all (or even: most) index AMs (e.g. BRIN probably never will > > > have a 'primitive scans' metric that differs from the loop count), so > > > maybe this would better be implemented in that framework? > > > > What do you mean by "within that framework"? They seem orthogonal? > > What I meant was putting this 'primitive scans' info into the > AM-specific explain callback as seen in the latest patch version. I don't see how that could work. This is fundamentally information that is only known when the query has fully finished execution. Again, this is already something that we track at the whole-table level, within pg_stat_user_tables.idx_scan. It's already considered index AM agnostic information, in that sense. > > It's true that BRIN index scans will probably never show more than a > > single primitive index scan. I don't think that the same is true of > > any other index AM, though. Don't they all support SAOPs, albeit > > non-natively? > > Not always. For Bitmap Index Scan the node's functions can allow > non-native SAOP support (it ORs the bitmaps), but normal indexes > without SAOP support won't get SAOP-functionality from the IS/IOS > node's infrastructure, it'll need to be added as Filter. Again, what do you want me to do about it? Almost anything is possible in principle, and can be implemented without great difficulty. But you have to clearly say what you want, and why you want it. Yeah, non-native SAOP index scans are always bitmap scans. In the case of GIN, there are only lossy/bitmap index scans, anyway -- can't see that ever changing. In the case of GiST, we could in the future add native SAOP support, so do we really want to be inconsistent in what we show now? (Tom said something about that recently, in fact.) I don't hate the idea of selectively not showing this information (for BRIN, say). Just as I don't hate the idea of totally omitting "loops=1" in the common case where we couldn't possibly be more than one loop in practice. It's just that I don't think that it's worth it, on balance. Not all redundancy is bad. > > The important question is: what do you want to do about cases like the > > BRIN case? Our choices are all fairly obvious choices. We can be > > selective, and *not* show this information when a set of heuristics > > indicate that it's not relevant. This is fairly straightforward to > > implement. Which do you prefer: overall consistency, or less > > verbosity? > > Consistency, I suppose. But adding explain attributes left and right > in Index Scan's explain output when and where every index type needs > them doesn't scale, so I'd put index-specific output into it's own > system (see the linked thread for more rationale). I can't argue with that. I just don't think it's directly relevant. > And, in this case, > the use case seems quite index-specific, at least for IS/IOS nodes. I disagree. It's an existing concept, exposed in system views, and now in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less. The fact that it tends to be much more useful in the case of nbtree (at least for now) makes this no less true. > This made me notice that you add a new metric that should generally be > exactly the same as pg_stat_all_indexes.idx_scan (you mention the > same). I didn't imagine that that part was subtle. > Can't you pull that data, instead of inventing a new place > every AMs needs to touch for it's metrics? No. At least not in a way that's scoped to a particular index scan. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Fri, 16 Aug 2024 at 00:34, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > I'm asking, because > > > > I'm not very convinced that 'primitive scans' are a useful metric > > > > across all (or even: most) index AMs (e.g. BRIN probably never will > > > > have a 'primitive scans' metric that differs from the loop count), so > > > > maybe this would better be implemented in that framework? > > > > > > What do you mean by "within that framework"? They seem orthogonal? > > > > What I meant was putting this 'primitive scans' info into the > > AM-specific explain callback as seen in the latest patch version. > > I don't see how that could work. This is fundamentally information > that is only known when the query has fully finished execution. If the counter was put into the BTScanOpaque, rather than the IndexScanDesc, then this could trivially be used in an explain AM callback, as IndexScanDesc and ->opaque are both still available while building the explain output. As a result, it wouldn't bloat the IndexScanDesc for other index AMs who might not be interested in this metric. > Again, this is already something that we track at the whole-table > level, within pg_stat_user_tables.idx_scan. It's already considered > index AM agnostic information, in that sense. That's true, but for most indexes there is a 1:1 relationship between loops and idx_scan counts, with ony btree behaving differently in that regard. Not to say it isn't an important insight for btree, but just that it seems to be only relevant for btree and no other index I can think of right now. > > > It's true that BRIN index scans will probably never show more than a > > > single primitive index scan. I don't think that the same is true of > > > any other index AM, though. Don't they all support SAOPs, albeit > > > non-natively? > > > > Not always. For Bitmap Index Scan the node's functions can allow > > non-native SAOP support (it ORs the bitmaps), but normal indexes > > without SAOP support won't get SAOP-functionality from the IS/IOS > > node's infrastructure, it'll need to be added as Filter. > > Again, what do you want me to do about it? Almost anything is possible > in principle, and can be implemented without great difficulty. But you > have to clearly say what you want, and why you want it. I don't want anything, or anything done about it, but your statement that all index AMs support SAOP (potentially non-natively) is not true, as the non-native SAOP support is only for bitmap index scans, and index AMs aren't guaranteed to support bitmap index scans (e.g. pgvector's IVFFLAT and HNSW are good examples, as they only support amgettuple). > Yeah, non-native SAOP index scans are always bitmap scans. In the case > of GIN, there are only lossy/bitmap index scans, anyway -- can't see > that ever changing. GIN had amgettuple-based index scans until the fastinsert path was added, and with some work (I don't think it needs to be a lot) the feature can probably be returned to the AM. The GIN internals would probably only need relatively few changes, as they already seem to mostly use precise TID-based scans - the only addition would be a filter that prohibits returning tuples that were previously returned while scanning the fastinsert path during the normal index scan. > > And, in this case, > > the use case seems quite index-specific, at least for IS/IOS nodes. > > I disagree. It's an existing concept, exposed in system views, and now > in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less. To be precise, it is not precisely that, because it's a different counter that an AM must update when the pgstat data is updated if it wants the explain output to reflect the stats counter accurately. When an AM forgets to update one of these metrics (or fails to realize they have to both be updated) then they'd be out-of-sync. I'd prefer if an AM didn't have to account for it's statistics in more than one place. > > This made me notice that you add a new metric that should generally be > > exactly the same as pg_stat_all_indexes.idx_scan (you mention the > > same). > > I didn't imagine that that part was subtle. It wasn't, but it was not present in the first two paragraphs of the mail, which I had only skimmed when I sent my first reply (as you maybe could see indicated by the quote). That's why it took me until my second reply to realise these were considered to be equivalent, especially after I noticed the headerfile changes where you added a new metric rather than pulling data from existing stats. > > Can't you pull that data, instead of inventing a new place > > every AMs needs to touch for it's metrics? > > No. At least not in a way that's scoped to a particular index scan. Similar per-node counter data is pulled for the global (!) counters of pgBufferUsage, so why would it be less possible to gather such metrics for just one index's stats here? While I do think it won't be easy to find a good way to integrate this into EXPLAIN's Instrumentation, I imagine other systems (e.g. table scans) may benefit from a better integration and explanation of pgstat statistics in EXPLAIN, too. E.g. I'd love to be able to explain how many times which function was called in a plans' projections, and what the relevant time expendature for those functions is in my plans. This data is available with track_functions enabled, and diffing in the execution nodes should allow this to be shown in EXPLAIN output. It'd certainly be more expensive than not doing the analysis, but I believe that's what EXPLAIN options are for - you can show a more detailed analysis at the cost of increased overhead in the plan execution. Alternatively, you could update the patch so that only the field in IndexScan would need to be updated by the index AM by making the executor responsible to update the relation's stats at once at the end of the query with the data from the IndexScanDesc. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Tue, Aug 27, 2024 at 5:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > If the counter was put into the BTScanOpaque, rather than the > IndexScanDesc, then this could trivially be used in an explain AM > callback, as IndexScanDesc and ->opaque are both still available while > building the explain output. Right, "trivial". Except in that it requires inventing a whole new general purpose infrastructure. Meanwhile, Tom is arguing against even showing this very basic information in EXPLAIN ANALYZE.You see the problem? > As a result, it wouldn't bloat the > IndexScanDesc for other index AMs who might not be interested in this > metric. Why do you persist with the idea that it isn't useful for other index AMs? I mean it literally works in exactly the same way! It's literally indistinguishable to users, and works in a way that's consistent with historical behavior/definitions. > I don't want anything, or anything done about it, but your statement > that all index AMs support SAOP (potentially non-natively) is not > true, as the non-native SAOP support is only for bitmap index scans, > and index AMs aren't guaranteed to support bitmap index scans (e.g. > pgvector's IVFFLAT and HNSW are good examples, as they only support > amgettuple). Yes, there are some very minor exceptions -- index AMs where even non-native SAOPs won't be used. What difference does it make? > > > And, in this case, > > > the use case seems quite index-specific, at least for IS/IOS nodes. > > > > I disagree. It's an existing concept, exposed in system views, and now > > in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less. > > To be precise, it is not precisely that, because it's a different > counter that an AM must update when the pgstat data is updated if it > wants the explain output to reflect the stats counter accurately. Why does that matter? I could easily move the counter to the opaque struct, but that would make the patch longer and more complicated, for absolutely no benefit. > When an AM forgets to update one of these metrics (or fails to realize they > have to both be updated) then they'd be out-of-sync. I'd prefer if an > AM didn't have to account for it's statistics in more than one place. I could easily change the pgstat_count_index_scan macro so that index AMs were forced to do both, or neither. (Not that this is a real problem.) > > > Can't you pull that data, instead of inventing a new place > > > every AMs needs to touch for it's metrics? > > > > No. At least not in a way that's scoped to a particular index scan. > > Similar per-node counter data is pulled for the global (!) counters of > pgBufferUsage, so why would it be less possible to gather such metrics > for just one index's stats here? I told you why already, when we talked about this privately: there is no guarantee that it's the index indicated by the scan instrumentation. For example, due to syscache lookups. There's also the question of how we maintain the count for things like nestloop joins, where invocations of different index scan nodes may be freely woven together. So it just won't work. Besides, I thought that you wanted me to use some new field in BTScanOpaque? But now you want me to use a global counter. Which is it? > While I do think it won't be easy to > find a good way to integrate this into EXPLAIN's Instrumentation, I > imagine other systems (e.g. table scans) may benefit from a better > integration and explanation of pgstat statistics in EXPLAIN, too. E.g. > I'd love to be able to explain how many times which function was > called in a plans' projections, and what the relevant time expendature > for those functions is in my plans. Seems completely unrelated. > Alternatively, you could update the patch so that only the field in > IndexScan would need to be updated by the index AM by making the > executor responsible to update the relation's stats at once at the end > of the query with the data from the IndexScanDesc. I don't understand why this is an alternative to the other thing that you said. Or even why it's desirable. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Aug 27, 2024 at 5:03 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > If the counter was put into the BTScanOpaque, rather than the > > IndexScanDesc, then this could trivially be used in an explain AM > > callback, as IndexScanDesc and ->opaque are both still available while > > building the explain output. > > Right, "trivial". Except in that it requires inventing a whole new > general purpose infrastructure. Which seems to be in the process of being invented already elsewhere. > Meanwhile, Tom is arguing against even > showing this very basic information in EXPLAIN ANALYZE.You see the > problem? I think Tom's main issue is additional clutter when running just plain `explain analyze`, and he'd probably be fine with it if this was gated behind e.g. VERBOSE or a new "get me the AM's view of this node" -flag. > > As a result, it wouldn't bloat the > > IndexScanDesc for other index AMs who might not be interested in this > > metric. > > Why do you persist with the idea that it isn't useful for other index > AMs? Because - there are no other index AMs that would show a count that's different from loops (Yes, I'm explicitly ignoring bitmapscan's synthetic SAOP) - because there is already a place where this info is stored. > I mean it literally works in exactly the same way! It's literally > indistinguishable to users, and works in a way that's consistent with > historical behavior/definitions. Historically, no statistics/explain-only info is stored in the IndexScanDesc, all data inside that struct is relevant even when EXPLAIN was removed from the codebase. The same is true for TableScanDesc Now, you want to add this metadata to the struct. I'm quite hesitant to start walking on such a surface, as it might just be a slippery slope. > > I don't want anything, or anything done about it, but your statement > > that all index AMs support SAOP (potentially non-natively) is not > > true, as the non-native SAOP support is only for bitmap index scans, > > and index AMs aren't guaranteed to support bitmap index scans (e.g. > > pgvector's IVFFLAT and HNSW are good examples, as they only support > > amgettuple). > > Yes, there are some very minor exceptions -- index AMs where even > non-native SAOPs won't be used. What difference does it make? That not all index types (even: most index types) have no interesting performance numbers indicated by the count. > > > > And, in this case, > > > > the use case seems quite index-specific, at least for IS/IOS nodes. > > > > > > I disagree. It's an existing concept, exposed in system views, and now > > > in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less. > > > > To be precise, it is not precisely that, because it's a different > > counter that an AM must update when the pgstat data is updated if it > > wants the explain output to reflect the stats counter accurately. > > Why does that matter? Because to me it seels like one more thing an existing index AM's author needs to needlessly add to its index. > > When an AM forgets to update one of these metrics (or fails to realize they > > have to both be updated) then they'd be out-of-sync. I'd prefer if an > > AM didn't have to account for it's statistics in more than one place. > > I could easily change the pgstat_count_index_scan macro so that index > AMs were forced to do both, or neither. (Not that this is a real > problem.) That'd be one way to reduce the chances of accidental bugs, which seems like a good start. > > > > Can't you pull that data, instead of inventing a new place > > > > every AMs needs to touch for it's metrics? > > > > > > No. At least not in a way that's scoped to a particular index scan. > > > > Similar per-node counter data is pulled for the global (!) counters of > > pgBufferUsage, so why would it be less possible to gather such metrics > > for just one index's stats here? > > I told you why already, when we talked about this privately: there is > no guarantee that it's the index indicated by the scan > instrumentation. For the pgstat entry in rel->pgstat_info, it is _exactly_ guaranteed to be the index of the IndexScan node. pgBufferUsage happens to be global, but pgstat_info is gathered at the relation level. > For example, due to syscache lookups. Sure, if we're executing a query on catalogs looking at index's numscans might count multiple index scans if the index scan needs to access that same catalog table's data through that same catalog index, but in those cases I think it's an acceptable count difference. > There's also > the question of how we maintain the count for things like nestloop > joins, where invocations of different index scan nodes may be freely > woven together. So it just won't work. Gathering usage counters on interleaving execution nodes has been done for pgBufferUsage, so I don't see how it just won't work. To me, it seems very realistically possible. > Besides, I thought that you wanted me to use some new field in > BTScanOpaque? But now you want me to use a global counter. Which is > it? If you think it's important to have this info on all indexes then I'd prefer the pgstat approach over adding a field in IndexScanDescData. If instead you think that this is primarily important to expose for nbtree index scans, then I'd prefer putting it in the BTSO using e.g. the index AM analyze hook approach, as I think that's much more elegant than this. > > While I do think it won't be easy to > > find a good way to integrate this into EXPLAIN's Instrumentation, I > > imagine other systems (e.g. table scans) may benefit from a better > > integration and explanation of pgstat statistics in EXPLAIN, too. E.g. > > I'd love to be able to explain how many times which function was > > called in a plans' projections, and what the relevant time expendature > > for those functions is in my plans. > > Seems completely unrelated. I'd call "exposing function's pgstat data in explain" at least somewhat related to "exposing indexes' pgstat data in explain". > > Alternatively, you could update the patch so that only the field in > > IndexScan would need to be updated by the index AM by making the > > executor responsible to update the relation's stats at once at the end > > of the query with the data from the IndexScanDesc. > > I don't understand why this is an alternative to the other thing that > you said. Or even why it's desirable. I think it would be preferred over requiring Index AMs to maintain 2 fields in 2 very different locations but in the same way with the same update pattern. With the mentioned change, they'd only have to keep the ISD's numscans updated with rescans (or, _bt_first/_bt_search's). Your alternative approach of making pgstat_count_index_scan update both would probably have the same desired effect of requiring the AM author to only mind this one entry point for counting index scan stats. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote: > > Right, "trivial". Except in that it requires inventing a whole new > > general purpose infrastructure. > > Which seems to be in the process of being invented already elsewhere. None of this stuff about implementation details really matters if there isn't agreement on what actual user-visible behavior we want. We're very far from that right now. > > Meanwhile, Tom is arguing against even > > showing this very basic information in EXPLAIN ANALYZE.You see the > > problem? > > I think Tom's main issue is additional clutter when running just plain > `explain analyze`, and he'd probably be fine with it if this was gated > behind e.g. VERBOSE or a new "get me the AM's view of this node" > -flag. I'm not at all confident that you're right about that. > > I mean it literally works in exactly the same way! It's literally > > indistinguishable to users, and works in a way that's consistent with > > historical behavior/definitions. > > Historically, no statistics/explain-only info is stored in the > IndexScanDesc, all data inside that struct is relevant even when > EXPLAIN was removed from the codebase. The same is true for > TableScanDesc Please try to separate questions about user-visible behavior from questions about the implementation. Here you're answering a point I'm making about user visible behavior with a point about where the counter goes. It's just not relevant. At all. > Now, you want to add this metadata to the struct. I'm quite hesitant > to start walking on such a surface, as it might just be a slippery > slope. I don't know why you seem to assume that it's inevitable that we'll get a huge amount of similar EXPLAIN ANALYZE instrumentation, of which this is just the start. It isn't. It's far from clear that even something like my patch will get in. > > Seems completely unrelated. > > I'd call "exposing function's pgstat data in explain" at least > somewhat related to "exposing indexes' pgstat data in explain". Not in any practical sense. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Wed, 28 Aug 2024 at 01:42, Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote: > > > Right, "trivial". Except in that it requires inventing a whole new > > > general purpose infrastructure. > > > > Which seems to be in the process of being invented already elsewhere. > > None of this stuff about implementation details really matters if > there isn't agreement on what actual user-visible behavior we want. > We're very far from that right now. I'd expect the value to only be displayed for more verbose outputs (such as under VERBOSE, or another option, or an as of yet unimplemented unnamed "get me AM-specific info" option), and only if it differed from nloops or if the index scan is otherwise interesting and would benefit from showing this data, which would require AM involvement to check if the scan is "interesting". E.g. I think it's "interesting" to see only 1 index search /loop for an index SAOP (with array >>1 attribute, or parameterized), but not at all interesting to see 1 index search /loop for a scan with a single equality scankey on the only key attribute: if it were anything else that'd be an indication of serious issues (and we'd show it, because it wouldn't be 1 search per loop). > > > and works in a way that's consistent with > > > historical behavior/definitions. > > > > Historically, no statistics/explain-only info is stored in the > > IndexScanDesc, all data inside that struct is relevant even when > > EXPLAIN was removed from the codebase. The same is true for > > TableScanDesc > > Please try to separate questions about user-visible behavior from > questions about the implementation. Here you're answering a point I'm > making about user visible behavior with a point about where the > counter goes. It's just not relevant. At all. I thought you were talking about type definitions with your 'definitions', but apparently not. What were you referring to with "consistent with historical behavior/definitions"? > > Now, you want to add this metadata to the struct. I'm quite hesitant > > to start walking on such a surface, as it might just be a slippery > > slope. > > I don't know why you seem to assume that it's inevitable that we'll > get a huge amount of similar EXPLAIN ANALYZE instrumentation, of which > this is just the start. It isn't. It's far from clear that even > something like my patch will get in. It doesn't have to be a huge amount, but I'd be extremely careful setting a precedent where scandescs will have space reserved for data that can be derived from other fields, and is also used by approximately 0% of queries in any production workload (except when autoanalyze is enabled, in which case there are other systems that could probably gather this data). Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Besides, I thought that you wanted me to use some new field in > > BTScanOpaque? But now you want me to use a global counter. Which is > > it? > > If you think it's important to have this info on all indexes then I'd > prefer the pgstat approach over adding a field in IndexScanDescData. > If instead you think that this is primarily important to expose for > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > the index AM analyze hook approach, as I think that's much more > elegant than this. I agree with this analysis. I don't see why IndexScanDesc would ever be the right place for this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Wed, Aug 28, 2024 at 9:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > If you think it's important to have this info on all indexes then I'd > > prefer the pgstat approach over adding a field in IndexScanDescData. > > If instead you think that this is primarily important to expose for > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > > the index AM analyze hook approach, as I think that's much more > > elegant than this. > > I agree with this analysis. I don't see why IndexScanDesc would ever > be the right place for this. Then what do you think is the right place? There's no simple way to get to the planstate instrumentation from within an index scan. You could do it by passing it down as an argument to either ambeginscan or amrescan. But, realistically, it'd probably be better to just add a pointer to the instrumentation to the IndexScanDesc passed to amrescan. That's very close to what I've done already. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Wed, Aug 28, 2024 at 9:41 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Aug 28, 2024 at 9:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > If you think it's important to have this info on all indexes then I'd > > > prefer the pgstat approach over adding a field in IndexScanDescData. > > > If instead you think that this is primarily important to expose for > > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > > > the index AM analyze hook approach, as I think that's much more > > > elegant than this. > > > > I agree with this analysis. I don't see why IndexScanDesc would ever > > be the right place for this. > > Then what do you think is the right place? The paragraph that I agreed with and quoted in my reply, and that you then quoted in your reply to me, appears to me to address that exact question. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Wed, Aug 28, 2024 at 9:49 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 28, 2024 at 9:41 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Aug 28, 2024 at 9:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > If you think it's important to have this info on all indexes then I'd > > > > prefer the pgstat approach over adding a field in IndexScanDescData. > > > > If instead you think that this is primarily important to expose for > > > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > > > > the index AM analyze hook approach, as I think that's much more > > > > elegant than this. > > > > > > I agree with this analysis. I don't see why IndexScanDesc would ever > > > be the right place for this. > > > > Then what do you think is the right place? > > The paragraph that I agreed with and quoted in my reply, and that you > then quoted in your reply to me, appears to me to address that exact > question. Are you talking about adding global counters, in the style of pgBufferUsage? Or are you talking about adding it to BTSO? If it's the latter, then why isn't that at least as bad? It's just the IndexScanDesc thing, but with an additional indirection. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Matthias van de Meent
Date:
On Wed, 28 Aug 2024 at 15:53, Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Aug 28, 2024 at 9:49 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 28, 2024 at 9:41 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > On Wed, Aug 28, 2024 at 9:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > If you think it's important to have this info on all indexes then I'd > > > > > prefer the pgstat approach over adding a field in IndexScanDescData. > > > > > If instead you think that this is primarily important to expose for > > > > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > > > > > the index AM analyze hook approach, as I think that's much more > > > > > elegant than this. > > > > > > > > I agree with this analysis. I don't see why IndexScanDesc would ever > > > > be the right place for this. > > > > > > Then what do you think is the right place? > > > > The paragraph that I agreed with and quoted in my reply, and that you > > then quoted in your reply to me, appears to me to address that exact > > question. > > Are you talking about adding global counters, in the style of pgBufferUsage? My pgstat approach would be that ExecIndexScan (plus ExecIOS and ExecBitmapIS) could record the current state of relevant fields from node->ss.ss_currentRelation->pgstat_info, and diff them with the recorded values at the end of that node's execution, pushing the result into e.g. Instrumentation; diffing which is similar to what happens in InstrStartNode() and InstrStopNode() but for the relation's pgstat_info instead of pgBufferUsage and pgWalUsage. Alternatively this could happen in ExecProcNodeInstr, but it'd need some more special-casing to make sure it only addresses (index) relation scan nodes. By pulling the stats directly from Relation->pgstat_info, no catalog index scans are counted if they aren't also the index which is subject to that [Bitmap]Index[Only]Scan. In effect, this would need no changes in AM code, as this would "just" pull the data from those statistics which are already being updated in AM code. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Wed, Aug 28, 2024 at 9:52 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Aug 28, 2024 at 9:49 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > I agree with this analysis. I don't see why IndexScanDesc would ever > > > > be the right place for this. > > > > > > Then what do you think is the right place? > > > > The paragraph that I agreed with and quoted in my reply, and that you > > then quoted in your reply to me, appears to me to address that exact > > question. > > Are you talking about adding global counters, in the style of pgBufferUsage? > > Or are you talking about adding it to BTSO? If it's the latter, then > why isn't that at least as bad? It's just the IndexScanDesc thing, but > with an additional indirection. I need more feedback about this. I don't understand your perspective here. If I commit the skip scan patch, but don't have something like this instrumentation in place, it seems quite likely that users will complain about how opaque its behavior is. While having this instrumentation isn't quite a blocker to committing the skip scan patch, it's not too far off, either. I want to be pragmatic. Any approach that's deemed acceptable is fine by me, provided it implements approximately the same behavior as the patch that I wrote implements. Where is this state that tracks the number of index searches going to live, if not in IndexScanDesc? I get why you don't particularly care for that. But I don't understand what the alternative you favor looks like. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan <pg@bowt.ie> wrote: > I need more feedback about this. I don't understand your perspective here. Attached is a version of the patch that will apply cleanly against HEAD. (This is from v26 of my skip scan patch, which is why I've skipped so many version numbers compared to the last patch posted on this thread.) I still haven't changed anything about the implementation, since this is just to keep CFTester happy. -- Peter Geoghegan
Attachment
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan <pg@bowt.ie> wrote: > I need more feedback about this. I don't understand your perspective here. > > If I commit the skip scan patch, but don't have something like this > instrumentation in place, it seems quite likely that users will > complain about how opaque its behavior is. While having this > instrumentation isn't quite a blocker to committing the skip scan > patch, it's not too far off, either. I want to be pragmatic. Any > approach that's deemed acceptable is fine by me, provided it > implements approximately the same behavior as the patch that I wrote > implements. > > Where is this state that tracks the number of index searches going to > live, if not in IndexScanDesc? I get why you don't particularly care > for that. But I don't understand what the alternative you favor looks > like. +1 for having some instrumentation. I do not agree with Tom that these are numbers that only Peter Geoghegan and 2-3 other people will ever understand. I grant that it's not going to make sense to everyone, but the number of people to which it will make sense I would guess is probably in the hundreds or the thousands rather than the single digits. Good documentation could help. So, where should we store that information? The thing that's odd about using IndexScanDesc is that it doesn't contain any other instrumentation currently, or at least not that I can see. Everything else that EXPLAIN prints in terms of index scan is printed by show_instrumentation_count() from planstate->instrument. So it seems reasonable to think maybe this should be part of planstate->instrument, too, but there seem to be at least two problems with that idea. First, that struct has just four counters (ntuples, ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of them are in use -- a plain index scan does not use ntuples2 -- but I think this optimization can apply to both index and index-only scans, so we just don't have room. Second, those existing counters are used for things that we can count in the executor, but the executor won't know how many index searches occur down inside the AM. So I see the following possibilities: 1. Add a new field to struct Instrumentation. Make index_getnext_slot() and possibly other similar functions return a count of index searches via a new parameter, and use that to bump the new struct Instrumentation counter. It's a little ugly to have to add a special-purpose parameter for this, but it doesn't look like there are tons and tons of calls so maybe it's OK. 2. Add a new field to BufferUsage. Then the AM can bump this field and explain.c will know about it the same way it knows about other changes to pgBufferUsage. However, this is not really about buffer usage and the new field seems utterly unlike the other things that are in that structure, so this seems really bad to me. 3. Add a new field to IndexScanDesc, as you originally proposed. This seems similar to #1: it's still shoveling instrumentation data around in a way that we don't currently, but instead of shoveling it through a new parameter, we shovel it through a new structure member. Either way, the new thing (parameter or structure member) doesn't really look like it belongs with what's already there, so it seems like conservation of ugliness. 4. Add a new field to the btree-specific structure referenced by the IndexScanDesc's opaque pointer. I think this is what Matthias was proposing. It doesn't seem particularly hard to implement. and seems similar to #1 and #3. It is not clear to me that any of #1, #3, and #4 are radically better than any of the other ones, with the following exception: it would be a poor idea to choose #4 over #3 if this field will ultimately be used for a bunch of different AMs, and a poor idea to choose #3 over #4 if it's always going to be interesting only for btree. I'll defer to you on which of those things is the case, but with the request that you think about what is practically likely to happen and not advocate too vigorously based on an argument that makes prominent use of the phrase "in theory". To be honest, I don't really like any of these options very much: they all seem a tad inelegant. But sometimes that is a necessary evil when inventing something new. I believe if I were implementing this myself I would probably try #1 first; if that ended up seeming too ugly, then I would fall back to #3 or #4. Does that help? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Feb 27, 2025 at 3:42 PM Robert Haas <robertmhaas@gmail.com> wrote: > +1 for having some instrumentation. I do not agree with Tom that these > are numbers that only Peter Geoghegan and 2-3 other people will ever > understand. I grant that it's not going to make sense to everyone, but > the number of people to which it will make sense I would guess is > probably in the hundreds or the thousands rather than the single > digits. I agree that it's likely to be of interest to only a minority of users. But a fairly large minority. > Good documentation could help. It's easy to produce an example that makes intuitive sense. For example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and 5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index Searches: 5" -- one search per "a" value. Such an example might be more useful than my original pgbench_accounts example. Do you think that that would help? > So, where should we store that information? > > The thing that's odd about using IndexScanDesc is that it doesn't > contain any other instrumentation currently, or at least not that I > can see. It is unique right now, but perhaps only because this is the first piece of instrumentation that: A. We must track at the level of an individual index scan -- not at the level of an index relation, like pgstat_count_index_scan(), nor at the level of the whole system, like BufferUsage state. AND B. Requires that we count something that fundamentally lives inside the index AM -- something that we cannot reasonably track/infer from the executor proper (more on that below, in my response to your scheme #1). > First, that struct has just four counters (ntuples, > ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of > them are in use -- a plain index scan does not use ntuples2 -- but I > think this optimization can apply to both index and index-only scans, > so we just don't have room. Right, index-only scans have exactly the same requirements as plain index scans/bitmap index scans. > 1. Add a new field to struct Instrumentation. Make > index_getnext_slot() and possibly other similar functions return a > count of index searches via a new parameter, and use that to bump the > new struct Instrumentation counter. It's a little ugly to have to add > a special-purpose parameter for this, but it doesn't look like there > are tons and tons of calls so maybe it's OK. That seems like the strongest possible alternative to the original scheme used in the current draft patch (scheme #3). This scheme #1 has the same issue as scheme #3, though: it still requires an integer counter that tracks the number of index searches (something a bit simpler than that, such as a boolean flag set once per amgettuple call, won't do). This is due to there being no fixed limit on the number of index searches required during any single amgettuple call: in general the index AM may perform quite a few index searches before it is able to return the first tuple to the scan (or before it can return the next tuple). The only difference that I can see between scheme #1 and scheme #3 is that the former requires 2 counters instead of just 1. And, we'd still need to have 1 out of the 2 counters located either in IndexScanDesc itself (just like scheme #3), or located in some other struct that can at least be accessed through IndexScanDesc (like the index AM opaque state, per scheme #4). After all, *every* piece of state known to any amgettuple routine must ultimately come from IndexScanDesc (or from backend global state, per scheme #2). (Actually, I supposed it is technically possible to avoid storing anything in IndexScanDesc by inventing another amgettuple argument, just for this. That seems like a distinction without a difference, though.) > 2. Add a new field to BufferUsage. Then the AM can bump this field and > explain.c will know about it the same way it knows about other changes > to pgBufferUsage. However, this is not really about buffer usage and > the new field seems utterly unlike the other things that are in that > structure, so this seems really bad to me. I agree. The use of global variables seems quite inappropriate for something like this. It'll result in wrong output whenever an index scan uses an InitPlan in its qual when that InitPlan is itself a plan that contains an index scan (this is already an issue with "Buffers:" instrumentation, but this would be much worse). > 3. Add a new field to IndexScanDesc, as you originally proposed. This > seems similar to #1: it's still shoveling instrumentation data around > in a way that we don't currently, but instead of shoveling it through > a new parameter, we shovel it through a new structure member. Either > way, the new thing (parameter or structure member) doesn't really look > like it belongs with what's already there, so it seems like > conservation of ugliness. Perhaps a comment noting why the new counter lives in IndexScanDesc would help? > 4. Add a new field to the btree-specific structure referenced by the > IndexScanDesc's opaque pointer. I think this is what Matthias was > proposing. It doesn't seem particularly hard to implement. and seems > similar to #1 and #3. It definitely isn't hard to implement. But... > It is not clear to me that any of #1, #3, and #4 are radically better > than any of the other ones, with the following exception: it would be > a poor idea to choose #4 over #3 if this field will ultimately be used > for a bunch of different AMs, and a poor idea to choose #3 over #4 if > it's always going to be interesting only for btree. ...the requirements here really are 100% index-AM-generic. So I just don't see any advantage to scheme #4. All that I propose to do here is to display the information that is already tracked by pgstat_count_index_scan()/pg_stat_user_tables.idx_scan (at the level of each index relation) when EXPLAIN ANALYZE is run (at the level of each index scan). I'm not inventing a new concept; I'm extending an existing index-AM-generic approach. It just so happens that there is a greater practical need for that information within B-Tree scans. Admittedly, the actual amount of code the patch adds to nbtree is slightly more than it'll add to other index AMs (all of which require exactly one added line of code). But that's only because nbtree alone supports parallel index scans. Were it not for that, nbtree would also require only a single additional line of code. > I'll defer to you on which of those things is the case, but with the request that you > think about what is practically likely to happen and not advocate too > vigorously based on an argument that makes prominent use of the phrase > "in theory". Right now, the only instrumentation that lives inside index AMs is pgstat_count_index_scan(), which works at the relation level (not the scan level). Matthias may well be right that we'll eventually want to add more stuff like this. For example, maybe we'd show the number of index tuples that we evaluated against the scan's qual that were not actually returned to the executor proper. Then we really would be inventing a whole new concept -- but a concept that was also index-AM-neutral. I think that that's likely to be true generally, for all manner of possible instrumentation improvements that live inside index AMs -- so we'll likely end up putting more fields in IndexScanDesc for that stuff. > To be honest, I don't really like any of these options > very much: they all seem a tad inelegant. But sometimes that is a > necessary evil when inventing something new. I believe if I were > implementing this myself I would probably try #1 first; if that ended > up seeming too ugly, then I would fall back to #3 or #4. I do get that. I hope that you don't think that I've failed to take your feedback on board. I ended up at #3 only through yak-shaving/trial-and-error coding. Hopefully my reasoning makes sense. > Does that help? Yes, it does -- hugely. I'll need to think about it some more. The documentation definitely needs more work, at a minimum. Thanks for the review! -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Feb 27, 2025 at 7:57 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Feb 27, 2025 at 3:42 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Good documentation could help. Attached revision adds an example that shows how "Index Searches: N" can vary. This appears in "14.1.2. EXPLAIN ANALYZE". Other changes in this revision: * Improved commit message. * We now consistently show "Index Searches: N" after all other scan related output, so that it will reliably appear immediately before the "Buffers: " line. This seemed slightly better, since it is often useful to consider these two numbers together. My current plan is to commit this patch on Wednesday or Thursday, barring any objections. Thanks -- Peter Geoghegan
Attachment
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Thu, Feb 27, 2025 at 7:58 PM Peter Geoghegan <pg@bowt.ie> wrote: > It's easy to produce an example that makes intuitive sense. For > example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and > 5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index > Searches: 5" -- one search per "a" value. Such an example might be > more useful than my original pgbench_accounts example. > > Do you think that that would help? Yes. > It is unique right now, but perhaps only because this is the first > piece of instrumentation that: Yeah, possible. > Perhaps a comment noting why the new counter lives in IndexScanDesc would help? +1. > I do get that. I hope that you don't think that I've failed to take > your feedback on board. To the contrary, I appreciate you taking the time to listen to my opinion. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
Committed just now. Thanks again. On Mon, Mar 3, 2025 at 4:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 27, 2025 at 7:58 PM Peter Geoghegan <pg@bowt.ie> wrote: > > It's easy to produce an example that makes intuitive sense. For > > example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and > > 5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index > > Searches: 5" -- one search per "a" value. Such an example might be > > more useful than my original pgbench_accounts example. > > > > Do you think that that would help? > > Yes. As you might have seen already, I added an example involving SAOPs to "14.1.2. EXPLAIN ANALYZE". I have a TODO item about adding an additional example involving skip scan immediately afterwards, as part of the skip scan patch. > > Perhaps a comment noting why the new counter lives in IndexScanDesc would help? > > +1. Added a IndexScanDesc comment about this to the committed version. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Wed, Mar 5, 2025 at 9:37 AM Peter Geoghegan <pg@bowt.ie> wrote: > Committed just now. Thanks again. I had to revert this for now, due to issues with debug_parallel_query. Apologies for the inconvenience. The immediate problem is that when the parallel leader doesn't participate, there is no valid IndexScanDescData in planstate to work off of. There isn't an obvious way to get to shared memory from the leader process, since that all goes through the IndexScanDescData.parallel_scan -- there is nothing that points to shared memory in any of the relevant planstate structs (namely IndexScanState, IndexOnlyScanState, and BitmapIndexScanState). I was hoping that you'd be able to provide some guidance on how best to fix this. I think that the problem here is similar to the problem with hash joins and their HashInstrumentation struct -- at least in the parallel-oblivious case. Here are the points of similarity: * The information in question is for the node execution as a whole -- it is orthogonal to what might have happened in each individual worker, and displays the same basic operation-level stats. It is independent of whether or not the scan happened to use parallel workers or not. * For the most part when running a parallel hash join it doesn't matter what worker EXPLAIN gets its stats from -- they should all agree on the details (in the parallel-oblivious case, though the parallel-aware case is still fairly similar). Comments in show_hash_info explain this. * However, there are important exceptions: cases where the parallel leader didn't participate at all, or showed up late, never building its own hash table. We have to be prepared to get the information from all workers, iff the leader doesn't have it. I failed to account for this last point. I wonder if I can fix this using an approach like the one from bugfix commit 5bcf389ecf. Note that show_hash_info has since changed; at the time of the commit we only had parallel oblivious hash joins, so it made sense to loop through SharedHashInfo for workers and go with the details taken from the first worker that successfully built a hash table (the hash tables must be identical anyway). As I said, a sticking point for this approach is that there is no existing way to get to someplace in shared memory from the parallel leader when it never participated. Parallel index scans have their ParallelIndexScanDesc state stored when they call index_beginscan_parallel. But that's not happening in a parallel leader that never participates. Parallel hash join doesn't have that problem, I think, because the leader will reliably get a pointer to shared state when ExecHashInitializeDSM() is called. As comments in its ExecParallelInitializeDSM caller put it, ExecHashInitializeDSM is called "even when not parallel-aware, for EXPLAIN ANALYZE" -- this makes it like a few other kinds of nodes, but not like index scan nodes. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 1:17 PM Peter Geoghegan <pg@bowt.ie> wrote: > The immediate problem is that when the parallel leader doesn't > participate, there is no valid IndexScanDescData in planstate to work > off of. There isn't an obvious way to get to shared memory from the > leader process, since that all goes through the > IndexScanDescData.parallel_scan -- there is nothing that points to > shared memory in any of the relevant planstate structs (namely > IndexScanState, IndexOnlyScanState, and BitmapIndexScanState). I was > hoping that you'd be able to provide some guidance on how best to fix > this. Hmm, it seems weird that you can't get a hold of that structure to me. Why can't you just go find it in the DSM? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Mar 6, 2025 at 1:54 PM Robert Haas <robertmhaas@gmail.com> wrote: > Hmm, it seems weird that you can't get a hold of that structure to me. > Why can't you just go find it in the DSM? Sorry, I was unclear. One reason is that there isn't necessarily anything to find. Certainly, when I try this out with a debugger, even the B-Tree scan doesn't have doesn't even have IndexScanDescData.parallel_scan set. It isn't actually a parallel B-Tree scan. It is a serial/non-parallel-aware index scan that is run from a parallel worker, and feeds its output into a gather merge node despite all this. -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 1:58 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Mar 6, 2025 at 1:54 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Hmm, it seems weird that you can't get a hold of that structure to me. > > Why can't you just go find it in the DSM? > > Sorry, I was unclear. > > One reason is that there isn't necessarily anything to find. > Certainly, when I try this out with a debugger, even the B-Tree scan > doesn't have doesn't even have IndexScanDescData.parallel_scan set. It > isn't actually a parallel B-Tree scan. It is a > serial/non-parallel-aware index scan that is run from a parallel > worker, and feeds its output into a gather merge node despite all > this. Well, I think this calls the basic design into question. We discussed putting this into IndexScanDescData as a convenient way of piping it through to EXPLAIN, but what I think we have now discovered is that there isn't actually convenient at all, because every process has its own IndexScanDescData and the leader sees only its own. It seems like what you need is to have each process accumulate stats locally, and then at the end total them up. Maybe show_sort_info() has some useful precedent, since that's also a bit of node-specific instrumentation, and it seems to know what to do about workers. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Mar 6, 2025 at 2:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > Well, I think this calls the basic design into question. We discussed > putting this into IndexScanDescData as a convenient way of piping it > through to EXPLAIN, but what I think we have now discovered is that > there isn't actually convenient at all, because every process has its > own IndexScanDescData and the leader sees only its own.\ I agree that it isn't convenient. But there's an inescapable need to pass *something* down to amgettuple. Everything that we currently pass to amgettuple goes through the IndexScanDesc arg (its only other arg is ScanDirection), which isn't a bad reason to put this here too. So I still think that we need to either store something like nsearches in IndexScanDescData, or store a pointer to some other struct that contains the nsearches field (and possibly other fields, in the future). The only alternative is to change the amtuple signature (e.g., pass down planstate), which doesn't seem like an improvement. > It seems like > what you need is to have each process accumulate stats locally, and > then at the end total them up. Maybe show_sort_info() has some useful > precedent, since that's also a bit of node-specific instrumentation, > and it seems to know what to do about workers. That seems similar to the hash join case I looked at. I think that the main problem with the reverted patch isn't that it uses IndexScanDescData -- that detail is almost inevitable. The main problem is that it failed to teach nodeIndexscan.c/nodeIndexonlyscan.c/nodeBitmapIndexscan.c to place the IndexScanDescData.nsearches counter somewhere where explain.c could later get to reliably. That'd probably be easier if IndexScanDescData.nsearches was a pointer instead of a raw integer. Thanks -- Peter Geoghegan
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Thu, Mar 6, 2025 at 2:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > Maybe show_sort_info() has some useful > precedent, since that's also a bit of node-specific instrumentation, > and it seems to know what to do about workers. What do you think of the attached WIP patch, which does things this way? Does this seem like the right general direction to you? Unfortunately, my new approach seems to require quite a bit more code, including adding new parallel query functions for bitmap index scans (which previously didn't require anything like this at all). I can probably simplify it some more, but likely not by much. I now put a pointer to an instrumentation struct in IndexScanDescData. The pointer always points to local memory: specifically, it points to a dedicated field in each of the 3 supported executor node planstate structs. Each of the workers copy their local instrumentation struct into a dedicated space in shared memory, at the point that ExecIndexScanRetrieveInstrumentation/ExecIndexOnlyScanRetrieveInstrumentation/ExecBitmapIndexScanRetrieveInstrumentation is called (though only when running during EXPLAIN ANALYZE). Once we get to explain.c, we take more or less the same approach already used for things like sort nodes and hash join nodes. Obviously, this revised version of the patch passes all tests when the tests are run with debug_parallel_query=regress. -- Peter Geoghegan
Attachment
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
From
Peter Geoghegan
Date:
On Fri, Mar 7, 2025 at 12:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > What do you think of the attached WIP patch, which does things this > way? Does this seem like the right general direction to you? Attached is a more refined version of this patch, which is substantially the same the same as the version I posted yesterday. My current plan is to commit this on Tuesday or Wednesday, barring any objections. Thanks -- Peter Geoghegan