Re: Showing applied extended statistics in explain Part 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Showing applied extended statistics in explain Part 2 |
Date | |
Msg-id | 7b2aac96-4b56-48b8-a07b-0ee877f3732c@vondra.me Whole thread Raw |
In response to | Re: Showing applied extended statistics in explain Part 2 (Tatsuro Yamada <yamatattsu@gmail.com>) |
List | pgsql-hackers |
On 2/12/25 06:02, Tatsuro Yamada wrote: > Hi Tomas and ALL, > >>I wonder what Yamada-san thinks about these suggestions ... He's the one >>actually developing the patch, so I'd like to know his opinions. > > I will state my thoughts on the two points of discussion. > >>>>> I often use SQL Server to compare execution plans generated by >>>>> PostgreSQL, and I appreciate how they display the usage of extended >>>>> statistics. They clearly identify which statistics were utilized during >>>>> query planning and include them in the summary section of query plan. >>>>> >>>> I admit not knowing what exactly SQL Server shows in the explain, but it >>>> seems helpful to know which part of the plan used which statistics, no? >>> >>> I wonder if you meant clauses, not a part of the plan. Clauses may be >>> used on different levels of the join tree, but they are estimated once. >>> >>> I meant we may just refer to the statistic used by its name in the >>> node's explanation if this node contains estimated clause. I use >>> multiple extended statistics. Sometimes, they intersect, and it is hard >>> to say which one was used for a specific clause. >> >>I did mean "part of the plan", i.e. the actual operation. That being >>said, maybe you're right the exact node doesn't matter all that much, >>and it'd be better to have a separate list of stats for the whole plan. >> >>That'd probably work better for stuff like estimate_num_groups(), which >>is hard to assign to a particular operation ... > > > The patch currently under development aims to display the actual usage of > extended statistics on a plan-by-plan basis. > I thought that being able to check the extended statistics used by plan > nodes > in a plan would make it easier to tune plans using extended statistics. > > On the other hand, I also think that a mechanism to check the usage of > extended statistics on a DB-by-DB basis would be useful. > I talked about this idea in my talk at PGConf.dev 2024. > This is similar to pg_stat_user_indexes for indexes. If you can find > unused or > infrequently used extended statistics, you can delete unnecessary ones. > However, this functionality is not included in this patch. I plan to > provide it as > a separate patch after the current patch is committed. > +1 to treating the DB-level stuff (i.e. system catalog showing how often was each statistics used, etc.) as a separate feature / patch. I did not mean to suggest this should be included in this patch, my point was that it's very different from what this patch aimed to do. > >>>> I may be wrong, but I don't quite see why would that be significantly >>>> easier to implement. You still need to track the information somewhere, >>>> and it probably needs for individual Path nodes. Because while building >>>> the plan you don't know which paths will get selected. >>> In my mind, gathering extended statistics usage data may be implemented >>> like it is already done for many parameters in the stat collector. It >>> may help identify whether statistics are used frequently or not. >> >>Isn't the statistics collector dealing with a completely different use >>case? Sure, maybe it would be interesting to track how many queries used >>which statistics object / for what purpose. A bit like we track counts >>for indexes etc. >> >>But that seems very different from what this patch aims to do, which is >>to track info about how a particular query used extended stats. >> >>I suppose we could have a "buffer" to remember which statistics objects >>were used by a single statement/transaction, just like we do for tables >>or indexes. But I still think we should show clauses estimated by each >>statistics object in a given query, and this would not help with that. > > Ideally, it would be nice to be able to show which extended statistics > were used for which clauses. However, based on the discussion so far, > this seems difficult to implement. > Maybe I'm missing something, but why would this be difficult to implement? There was a problem with RestrictInfo nodes, but doesn't Tom's suggestion [1] to use extract_actual_clauses() address this? I might have done something silly in the PoC patch. https://www.postgresql.org/message-id/1691487.1733180716@sss.pgh.pa.us > A second best solution would be to add a summary of the extended statistics > used at the end of the plan. This alone would make it easier for users to > use the extended statistics. > > The expected output is as follows: > - Do not output "Clauses: ". > - Display only the names of the extended statistics used. > - If displaying the names of multiple extended statistics, > should they be joined by commas? (Or should Ext Stats: be displayed > on multiple lines?) > The list of stats is better than no information at all, but I don't quite see why we couldn't show clauses, and to me the clauses seem like a pretty important piece of information. > e.g., > explain (stats) select 1 from t where a = 1 and b =1; > > ## Output from the current patch > QUERY PLAN > -------------------------------------------------------- > Seq Scan on t (cost=0.00..1943.00 rows=10210 width=4) > Filter: ((a = 1) AND (b = 1)) > Ext Stats: s Clauses: ((a = 1) AND (b = 1)) > (3 rows) > > > ## Output for displaying summary extended statistics > QUERY PLAN > -------------------------------------------------------- > Seq Scan on t (cost=0.00..1943.00 rows=10210 width=4) > Filter: ((a = 1) AND (b = 1)) > Planning: > Ext Stats: s <== Is it acceptable? > > If we do not display "Clause" information, I believe that there is > no need to deparse clauses, and therefore no need to improve the deparse > function to enable recursive traversal of expression trees. > What do you mean by the need to improve the deparse function? Is this about the nested RestrictInfos? Have you tried doing that Tom suggested in [1], or does it not address the issue? > To further the discussion, I will modify the patch to display the summary > as shown above and send it to -hackers. > > P.S. > I have tried to understand the discussion correctly, but please > let me know if I have misunderstood anything. > Thanks for working on this! regards -- Tomas Vondra
pgsql-hackers by date: