Re: Find query characters in respect of optimizer for develop purpose - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Find query characters in respect of optimizer for develop purpose
Date
Msg-id CAAKRu_Y64GmVLxjs0RTaUVLz7m5+D=6h=yM4Rw9GhiLhSNbkKg@mail.gmail.com
Whole thread Raw
In response to Find query characters in respect of optimizer for develop purpose  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers

On Mon, May 18, 2020 at 1:30 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hello:

Before I want to pay attention to some optimizer features, I want to
estimate how much benefits it can create for customers, at least for our current
running customer. So I want to have some basic idea what kind of the query is
running now in respect of optimizer.  


You are imagining this to be collected during planning on a live
customer system as a form of telemetry?
I was inspired to search the hackers mailing list archive for the word
"telemetry" and didn't get many hits, which surprised me.
 

My basic is we can track it with the below struct(every backend has one global 
variable to record it).

+typedef struct
+{
+       int     subplan_count;
+       int     subquery_count;
+       int join_count;
+       bool hasagg;
+       bool hasgroup;
+} QueryCharacters;

it will be reset at the beginning of standard_planner, and the values are
increased at  make_subplan, set_subquery_pathlist, make_one_rel,
create_grouping_paths. later it can be tracked and viewed in
pg_stat_statements.


I think the natural reaction to this idea is: isn't there a 3rd party
tool that does this? Or can't you use one of the hooks and write an
extension, to, for example, examine the parse,query,and plan trees?

However, it does seem like keeping track of this information would be
much easier during planning since planner will be examining the query
tree and making the plan anyway.

On the other hand, I think that depends a lot on what specific
information you want to collect. Out of the fields you listed, it is
unclear what some of them would mean.
Does join_count count the number of explicit joins in the original query
or does it count the number of joins in the final plan? Does
subquery_count count all sub-selects in the original query or does it
only count subqueries that become SubqueryScans or SubPlans? What about
subqueries that become InitPlans?

One concern I have is that it seems like this struct would have to be
updated throughout planning and that it would be easy to break it with
the addition of new code. Couldn't every new optimization added to
planner potentially affect the accuracy of the information in the
struct?

--
Melanie Plageman

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Recording test runtimes with the buildfarm
Next
From: Tom Lane
Date:
Subject: Re: Recording test runtimes with the buildfarm