Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries |
Date | |
Msg-id | alpine.DEB.2.20.1612271718150.4911@lancre Whole thread Raw |
In response to | Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
|
List | pgsql-hackers |
Hello Kyotaro-san, > In nonterminal stmtmulti, setQueryLocation is added and used to > calcualte and store the length of every stmt's. This needs an > extra storage in bse_yy_extra_type The extra storage is one int. > and introduces a bit complicated stuff. But I think raw_parser can do > that without such extra storage and labor, How? The issue is that stmtmulti silently skip some ';' when empty statements are found, so I need to keep track of that to know where to stop, using the beginning location of the next statement, which is probably your idea, does not work. > then gram.y gets far simpler. > The struct member to store the location and length is named > 'qlocation', and 'qlength' in the added ParseNode. But many nodes > already have 'location' of exactly the same purpopse. I don't see > a necessity to differentiate them. Alas, the "CreateTableSpaceStmt" struct already had a "char *location" that I did not want to change... If I use "location", then I have to change it, why not... Another reason for the name difference is that qlocation/qlength convention is slightly different from location when not set: location is set manually to -1 when the information is not available, but this convention is not possible for statements without changing all their allocations and initializations (there are hundreds...), so the convention I used for qlocation/qlength is that it is not set if qlength is zero, which is the default value set by makeNode, so no change was needed... Changing this point would mean a *lot* of possibly error prone changes in the parser code everywhere statements are allocated... > Putting the two above together, the following is my suggestion > for the parser part. > > - Make all parse nodes have the following two members at the > beginning. This unifies all parse node from the view of setting > their location. Commenting about this would be better. > > | NodeTag type; > | int location; Currently only a few parse nodes have locations, those about variables and constants that can be designated by an error message. I added the information to about 100 statements, but for the many remaining parse nodes the information does not exist and is not needed, so I would prefer to avoid adding a field both unset and unused. > - Remove struct ParseNode "ParseNode" is needed to access the location and length of statements, otherwise they are only "Node", which does not have a location. > and setQueryLocation. The function is called twice, I wanted to avoid duplicating code. > stmtmulti sets location in the same manner to other kind of nodes, just > doing '= @n'. base_yyparse() doesn't calculate statement length. But... > - Make raw_parser caluclate statement length then store it into > each stmt scanning the yyextra.parsetree. ... I cannot calculate the statement length cleanly because of empty statements. If I know where the last ';' is, then the length computation must be when the reduction occurs, hence the function called from the stmtmulti rule. > What do you think about this? I think that I do not know how to compute the length without knowing where the last ';' was, because of how empty statements are silently skipped around the stmtmulti/stmt rules, so I think that the length computation should stay where it is. What I can certainly do is: - add more comments to explain why it is done like that What I could do with some inputs from reviewers/committers is: - rename the "char *location" field of the create table space statement to "directory" or something else... but what? - change qlocation/qlength to location/length... However, then the -1 convention for location not set is not true, which is annoying. Using this convention means hundredsof changes because every statement allocation & initialization must be changed. This is obviously possible, butis a much larger patch, which I cannot say would be much better than this one, it would just be different. What I'm reserved about: - adding a location field to nodes that do not need it. -- Fabien.
pgsql-hackers by date: