Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
Date
Msg-id alpine.DEB.2.20.1612271718150.4911@lancre
Whole thread Raw
In response to Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
List pgsql-hackers
Hello Kyotaro-san,

> In nonterminal stmtmulti, setQueryLocation is added and used to
> calcualte and store the length of every stmt's. This needs an
> extra storage in bse_yy_extra_type

The extra storage is one int.

> and introduces a bit complicated stuff. But I think raw_parser can do 
> that without such extra storage and labor,

How? The issue is that stmtmulti silently skip some ';' when empty 
statements are found, so I need to keep track of that to know where to 
stop, using the beginning location of the next statement, which is 
probably your idea, does not work.

> then gram.y gets far simpler.


> The struct member to store the location and length is named
> 'qlocation', and 'qlength' in the added ParseNode. But many nodes
> already have 'location' of exactly the same purpopse. I don't see
> a necessity to differentiate them.

Alas, the "CreateTableSpaceStmt" struct already had a "char *location" 
that I did not want to change... If I use "location", then I have to 
change it, why not...

Another reason for the name difference is that qlocation/qlength 
convention is slightly different from location when not set: location is 
set manually to -1 when the information is not available, but this 
convention is not possible for statements without changing all their 
allocations and initializations (there are hundreds...), so the convention 
I used for qlocation/qlength is that it is not set if qlength is zero, 
which is the default value set by makeNode, so no change was needed...

Changing this point would mean a *lot* of possibly error prone changes in 
the parser code everywhere statements are allocated...

> Putting the two above together, the following is my suggestion
> for the parser part.
>
> - Make all parse nodes have the following two members at the
>  beginning. This unifies all parse node from the view of setting
>  their location. Commenting about this would be better.
>
>  | NodeTag   type;
>  | int       location;

Currently only a few parse nodes have locations, those about variables and 
constants that can be designated by an error message. I added the 
information to about 100 statements, but for the many remaining parse 
nodes the information does not exist and is not needed, so I would prefer 
to avoid adding a field both unset and unused.

> - Remove struct ParseNode

"ParseNode" is needed to access the location and length of statements, 
otherwise they are only "Node", which does not have a location.

> and setQueryLocation.

The function is called twice, I wanted to avoid duplicating code.

> stmtmulti sets location in the same manner to other kind of nodes, just 
> doing '= @n'. base_yyparse() doesn't calculate statement length.

But...

> - Make raw_parser caluclate statement length then store it into
>  each stmt scanning the yyextra.parsetree.

... I cannot calculate the statement length cleanly because of empty 
statements. If I know where the last ';' is, then the length computation 
must be when the reduction occurs, hence the function called from the 
stmtmulti rule.

> What do you think about this?

I think that I do not know how to compute the length without knowing where 
the last ';' was, because of how empty statements are silently skipped 
around the stmtmulti/stmt rules, so I think that the length computation 
should stay where it is.

What I can certainly do is:
 - add more comments to explain why it is done like that


What I could do with some inputs from reviewers/committers is:
 - rename the "char *location" field of the create table space statement   to "directory" or something else... but
what?
 - change qlocation/qlength to location/length...
   However, then the -1 convention for location not set is not true, which   is annoying. Using this convention means
hundredsof changes because every   statement allocation & initialization must be changed. This is   obviously possible,
butis a much larger patch, which I cannot say   would be much better than this one, it would just be different.
 

What I'm reserved about:
 - adding a location field to nodes that do not need it.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Next
From: Greg Stark
Date:
Subject: Re: [HACKERS] pg_stat_activity.waiting_start