Thread: Find query characters in respect of optimizer for develop purpose

Find query characters in respect of optimizer for develop purpose

From
Andy Fan
Date:
Hello:

Before I want to pay attention to some optimizer features, I want to
estimate how much benefits it can create for customers, at least for our current
running customer. So I want to have some basic idea what kind of the query is
running now in respect of optimizer.  


My basic is we can track it with the below struct(every backend has one global 
variable to record it).

+typedef struct
+{
+       int     subplan_count;
+       int     subquery_count;
+       int join_count;
+       bool hasagg;
+       bool hasgroup;
+} QueryCharacters;

it will be reset at the beginning of standard_planner, and the values are
increased at  make_subplan, set_subquery_pathlist, make_one_rel,
create_grouping_paths. later it can be tracked and viewed in
pg_stat_statements.


What do you think about the requirement and the method I am thinking? Any
kind of feedback is welcome.


--
Best Regards
Andy Fan

Re: Find query characters in respect of optimizer for develop purpose

From
Melanie Plageman
Date:

On Mon, May 18, 2020 at 1:30 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hello:

Before I want to pay attention to some optimizer features, I want to
estimate how much benefits it can create for customers, at least for our current
running customer. So I want to have some basic idea what kind of the query is
running now in respect of optimizer.  


You are imagining this to be collected during planning on a live
customer system as a form of telemetry?
I was inspired to search the hackers mailing list archive for the word
"telemetry" and didn't get many hits, which surprised me.
 

My basic is we can track it with the below struct(every backend has one global 
variable to record it).

+typedef struct
+{
+       int     subplan_count;
+       int     subquery_count;
+       int join_count;
+       bool hasagg;
+       bool hasgroup;
+} QueryCharacters;

it will be reset at the beginning of standard_planner, and the values are
increased at  make_subplan, set_subquery_pathlist, make_one_rel,
create_grouping_paths. later it can be tracked and viewed in
pg_stat_statements.


I think the natural reaction to this idea is: isn't there a 3rd party
tool that does this? Or can't you use one of the hooks and write an
extension, to, for example, examine the parse,query,and plan trees?

However, it does seem like keeping track of this information would be
much easier during planning since planner will be examining the query
tree and making the plan anyway.

On the other hand, I think that depends a lot on what specific
information you want to collect. Out of the fields you listed, it is
unclear what some of them would mean.
Does join_count count the number of explicit joins in the original query
or does it count the number of joins in the final plan? Does
subquery_count count all sub-selects in the original query or does it
only count subqueries that become SubqueryScans or SubPlans? What about
subqueries that become InitPlans?

One concern I have is that it seems like this struct would have to be
updated throughout planning and that it would be easy to break it with
the addition of new code. Couldn't every new optimization added to
planner potentially affect the accuracy of the information in the
struct?

--
Melanie Plageman