Before I want to pay attention to some optimizer features, I want to estimate how much benefits it can create for customers, at least for our current running customer. So I want to have some basic idea what kind of the query is running now in respect of optimizer.
You are imagining this to be collected during planning on a live customer system as a form of telemetry? I was inspired to search the hackers mailing list archive for the word "telemetry" and didn't get many hits, which surprised me.
My basic is we can track it with the below struct(every backend has one global
variable to record it).
+typedef struct +{ + int subplan_count; + int subquery_count; + int join_count; + bool hasagg; + bool hasgroup; +} QueryCharacters;
it will be reset at the beginning of standard_planner, and the values are increased at make_subplan, set_subquery_pathlist, make_one_rel, create_grouping_paths. later it can be tracked and viewed in pg_stat_statements.
I think the natural reaction to this idea is: isn't there a 3rd party tool that does this? Or can't you use one of the hooks and write an extension, to, for example, examine the parse,query,and plan trees?
However, it does seem like keeping track of this information would be much easier during planning since planner will be examining the query tree and making the plan anyway.
On the other hand, I think that depends a lot on what specific information you want to collect. Out of the fields you listed, it is unclear what some of them would mean. Does join_count count the number of explicit joins in the original query or does it count the number of joins in the final plan? Does subquery_count count all sub-selects in the original query or does it only count subqueries that become SubqueryScans or SubPlans? What about subqueries that become InitPlans?
One concern I have is that it seems like this struct would have to be updated throughout planning and that it would be easy to break it with the addition of new code. Couldn't every new optimization added to planner potentially affect the accuracy of the information in the struct?