Thread: [HACKERS] BUG: pg_stat_statements query normalization issues with combinedqueries
[HACKERS] BUG: pg_stat_statements query normalization issues with combinedqueries
From
Fabien COELHO
Date:
Hello, While investigating a performance issue, I tried to get informations from pg_stat_statements, however I ran into another issue: it seems that when using combined queries pg_stat_statements query normalization does not work properly... 2 queries that should have been mapped to only one are instead map to... 3 cases, as constants are not all ignored: query BEGIN ; + SELECT data FROM Stuff WHERE id = 1 ; + SELECT data FROM Stuff WHERE id = 2 ; +SELECT data FROM Stuff WHERE id = 3 ; + COMMIT; BEGIN ; + SELECT data FROM Stuff WHERE id = 4 ; + SELECT data FROM Stuff WHERE id = 5 ; +SELECT data FROM Stuff WHERE id = 6 ; + COMMIT; BEGIN ; + SELECT data FROM Stuff WHERE id = ? + SELECT data FROM Stuff WHERE id = 2 +SELECT data FROM Stuff WHERE id = 3 + COMMIT; I was expecting the 2 combined queries either to be separated in individual queries "SELECT data FROM Stuff WHERE id = ?" or in one large queries with three ?, but not the above result... -- Fabien
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> I was expecting the 2 combined queries either to be separated in individual > queries "SELECT data FROM Stuff WHERE id = ?" or in one large queries with > three ?, but not the above result... Oops, I forgot the attachement, see repeat script on 9.6.1 -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
>> I was expecting the 2 combined queries either to be separated in individual >> queries "SELECT data FROM Stuff WHERE id = ?" or in one large queries with >> three ?, but not the above result... > > Oops, I forgot the attachement, see repeat script on 9.6.1 After some quick investigation, I concluded that the issue is that the whole combined query string is used instead of the part refering to the actual query being processed. As a result: - the full un-normalized string is used for both BEGIN & COMMIT => n shared entries, one for each actual query, begin& commit are mixed together - the fist-part only normalized string is used for each SELECT => 1 shared entry with "query" is the first partially normalied encountered combined query -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello again pgdevs, More investigations conclude that the information is lost by the parser. From a ;-separated string raw_parser returns a List<Node> which are directly the nodes of the statements, with empty statements silently skipped. The Node entry of high level statements (*Stmt) does NOT contain any location information, only some lower level nodes have it. A possible fix of this would be to add the query string location information systematically at the head of every high level *Stmt, which would then share both NodeTag and this information (say location and length). This would be a new intermediate kind of node of which all these statements would "inherit", say a "ParseNode" structure. The actual location information may be filled-in at quite a high level in the parser, maybe around "stmtmulti" and "stmt" rules, so it would not necessarily be too intrusive in the overall parser grammar. Then once the information is available, the query string may be cut where appropriate to only store the relevant string in pg_stat_statements or above. Would this approach be acceptable, or is modifying Nodes a no-go area? If it is acceptable, I can probably put together a patch and submit it. If not, I suggest to update the documentation to tell that pg_stat_statements does not work properly with combined queries. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Would this approach be acceptable, or is modifying Nodes a no-go area? > > If it is acceptable, I can probably put together a patch and submit it. > > If not, I suggest to update the documentation to tell that > pg_stat_statements does not work properly with combined queries. I think you've found a bug, but I'm a little doubtful about your proposed fix. However, I haven't studied the code, so I don't know what other approach might be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
At Tue, 20 Dec 2016 22:42:48 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZT94brLAGK7gCmxB4mO=C-Cdz1N8KN8Xen4sexHozN=Q@mail.gmail.com> > On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Would this approach be acceptable, or is modifying Nodes a no-go area? > > > > If it is acceptable, I can probably put together a patch and submit it. > > > > If not, I suggest to update the documentation to tell that > > pg_stat_statements does not work properly with combined queries. > > I think you've found a bug, but I'm a little doubtful about your > proposed fix. However, I haven't studied the code, so I don't know > what other approach might be better. That will work and doable, but the more fundamental issue here seems to be that post_parse_analyze_hook or other similar hooks are called with a Query incosistent with query_debug_string. It is very conveniently used but the name seems to me to be suggesting that such usage is out of purpose. I'm not sure, though. A similar behavior is shown in error messages but this harms far less than the case of pg_stat_statements. ERROR: column "b" does not exist LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com... ^ 1. Let all of the parse node have location in debug_query_string. (The original proposal) 2. Let the parser return once for each statement (like psql parser) and corresponding query string is stored in a varialbleother than debug_query_string. ... regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Robert & Kyotaro, >> I'm a little doubtful about your proposed fix. However, I haven't >> studied the code, so I don't know what other approach might be better. That is indeed the question! The key point is that the parser parses several statements from a string, but currently there is no clue about where the queries was found in the string. Only a parser may know about what is being parsed so can generate the location information. Now the possible solutions I see are: - the string is split per query before parsing, but this requires at least a lexer... it looks pretty uneffective to haveanother lexing phase involved, even if an existing lexer is reused. - the parser processes one query at a time and keep the "remaining unparse string" in some state that can be queried tocheck up to where it proceeded, but then the result must be stored somewhere and it would make sense that it wouldbe in the statement anyway, just the management of the location information would be outside the parser. Also thatwould add the cost of relaunching the parser, not sure how bad or insignificant this is. This is (2) below. - the parser still generates a List<Node> but keep track of the location of statements doing so, somehow... propably asI outlined. The query string information can be at some way of pointing in the initial string, or the substring itself that could be extracted at some point. I initially suggested the former because this is already what the parser does for some nodes, and because it seems simpler to do so. Extracting the string instead would suggest that the location of tokens within this statement are relative to this string rather than the initial one, but that involves a lot more changes and it is easier to miss something doing so. > That will work and doable, but the more fundamental issue here > seems to be that post_parse_analyze_hook or other similar hooks > are called with a Query incosistent with query_debug_string. Sure. > It is very conveniently used but the name seems to me to be suggesting > that such usage is out of purpose. I'm not sure, though. > > A similar behavior is shown in error messages but this harms far > less than the case of pg_stat_statements. > > ERROR: column "b" does not exist > LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com... > ^ Ah, I wrote this piece of code a long time ago:-) The location is relative to the full string, see comment above, changing that would involve much more changes, and I'm not sure whether it is desirable. Also, think of: SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo; Maybe the context to know which "SELECT * FROM foo" generates an error should be kept. > 1. Let all of the parse node have location in > debug_query_string. (The original proposal) Yep. > 2. Let the parser return once for each statement (like psql > parser) I'm not sure it does... "\;" generates ";" in the output and the psql lexer keeps on lexing. > and corresponding query string is stored in a > varialble other than debug_query_string. I think that would involve many changes because of the way postgres is written, the list is expected and returned by quite a few functions. Moreover query rewriting may generate several queries out of one anyway, so the list would be kept. So I'm rather still in favor of my initial proposal, that is extend the existing location information to statements, not only some tokens. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
At Wed, 21 Dec 2016 09:28:58 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612210841370.3892@lancre> > > Hello Robert & Kyotaro, > > >> I'm a little doubtful about your proposed fix. However, I haven't > >> studied the code, so I don't know what other approach might be better. > > That is indeed the question! > > The key point is that the parser parses several statements from a > string, but currently there is no clue about where the queries was > found in the string. Only a parser may know about what is being parsed > so can generate the location information. Now the possible solutions I > see are: > > - the string is split per query before parsing, but this requires at > least a lexer... it looks pretty uneffective to have another lexing > phase involved, even if an existing lexer is reused. I don't see this is promising. Apparently a waste of CPU cycles. > - the parser processes one query at a time and keep the "remaining > unparse string" in some state that can be queried to check up to > where it proceeded, but then the result must be stored somewhere > and it would make sense that it would be in the statement anyway, > just the management of the location information would be outside > the parser. Also that would add the cost of relaunching the parser, > not sure how bad or insignificant this is. This is (2) below. I raised this as a spoiler, I see this is somewhat too invasive for the problem to be solved. > - the parser still generates a List<Node> but keep track of the location > of statements doing so, somehow... propably as I outlined. Yeah, this seems most reasonable so far. It seems to me to be better that the statement location is conveyed as a part of a parse tree so as not to need need a side channel for location. I'd like to rename debug_query_string to more reasonable name if we are allowed but perhaps not. > The query string information can be at some way of pointing in the > initial > string, or the substring itself that could be extracted at some point. > I initially suggested the former because this is already what the > parser does for some nodes, and because it seems simpler to do so. > > Extracting the string instead would suggest that the location of > tokens > within this statement are relative to this string rather than the > initial one, but that involves a lot more changes and it is easier to > miss something doing so. Agreed that copying statement string would be too much. But the new *Stmt node should somehow have also the end location of the statement since the user of a parse tree cannot look into another one. > > That will work and doable, but the more fundamental issue here > > seems to be that post_parse_analyze_hook or other similar hooks > > are called with a Query incosistent with query_debug_string. > > Sure. > > > It is very conveniently used but the name seems to me to be suggesting > > that such usage is out of purpose. I'm not sure, though. > > > > A similar behavior is shown in error messages but this harms far > > less than the case of pg_stat_statements. > > > > ERROR: column "b" does not exist > > LINE 1: ...elect * from t where a = 1; select * from t where b = 2; > > com... > > ^ > > Ah, I wrote this piece of code a long time ago:-) The location is > relative to the full string, see comment above, changing that would > involve much > more changes, and I'm not sure whether it is desirable. > > Also, think of: > > SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo; > > Maybe the context to know which "SELECT * FROM foo" generates an error > should be kept. > > > 1. Let all of the parse node have location in > > debug_query_string. (The original proposal) > > Yep. > > > 2. Let the parser return once for each statement (like psql > > parser) > > I'm not sure it does... "\;" generates ";" in the output and the psql > lexer keeps on lexing. > > > and corresponding query string is stored in a > > varialble other than debug_query_string. > > I think that would involve many changes because of the way postgres is > written, the list is expected and returned by quite a few > functions. Moreover query rewriting may generate several queries out > of one anyway, so the list would be kept. > > So I'm rather still in favor of my initial proposal, that is extend > the existing location information to statements, not only some tokens. I thought that it's too much to let all types of parse node have location but grepping the gram.y with "makeNode" pursuaded me to agree with you. After changing all *Stmt nodes, only several types of nodes seems missing it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Kyotaro-san, > [...] Agreed that copying statement string would be too much. But the > new *Stmt node should somehow have also the end location of the > statement since the user of a parse tree cannot look into another one. Yes. I was thinking of adding a "length" field next to "location", where appropriate. >> So I'm rather still in favor of my initial proposal, that is extend >> the existing location information to statements, not only some tokens. > > I thought that it's too much to let all types of parse node have > location but grepping the gram.y with "makeNode" pursuaded me to > agree with you. Indeed... > After changing all *Stmt nodes, only several types of nodes seems > missing it. Yes. I'll try to put together a patch and submit it to the next CF. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> Yes. I'll try to put together a patch and submit it to the next CF. Here it is. I'll add this to the next CF. This patch fixes the handling of compound/combined queries by pg_stat_statements (i.e. several queries sent together, eg with psql: "SELECT 1 \; SELECT 2;"). This bug was found in passing while investigating a strange performance issue I had with compound statements. Collect query location information in parser: * create a ParseNode for statements with location information fields (qlocation & qlength). * add these fields to all parsed statements (*Stmt), Query and PlannedStmt. * statements location is filled in: - the lexer keep track of the last ';' encountered. - the information is used by the parser to compute the query length, which depends on whether the query ended on ';' or on the input end. - the query location is propagated to Query and PlannedStmt when built. Fix pg_stat_statement: * the information is used by pg_stat_statement so as to extract the relevant part of the query string, including some trimming. * pg_stat_statement validation is greatly extended so as to test options and exercise various cases, including compound statements. note 1: two non-statements tags (use for alter table commands) have been moved so that all statements tags are contiguous and easy to check. note 2: the impact on the lexer & parser is quite minimal with this approach, about 30 LOC, most of which in one function. The largest changes are in the node header to add location fields. note 3: the query length excludes the final separator, so that ;-terminated and end of input terminated queries show the same. note 4: the added test suggests that when tracking "all", queries in SQL user functions are not tracked, this might be a bug. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
>> Yes. I'll try to put together a patch and submit it to the next CF. > > Here it is. I'll add this to the next CF. Oops... better without a stupid overflow. Shame on me! -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 21 Dec. 2016 11:44, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Tue, Dec 20, 2016 at 6:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:I think you've found a bug, but I'm a little doubtful about your
> Would this approach be acceptable, or is modifying Nodes a no-go area?
>
> If it is acceptable, I can probably put together a patch and submit it.
>
> If not, I suggest to update the documentation to tell that
> pg_stat_statements does not work properly with combined queries.
proposed fix. However, I haven't studied the code, so I don't know
what other approach might be better.
FWIW this issue with multi-statements also causes issues with ProcessUtility_hook. It gets the char* querytext of the whole multistatement. Then gets invoked once for each utility command within. It has no information about which statement text the current invocation corresponds to.
Having a.pointer into the query text for the start and end would be good there too. Not as good as doing away with multis entirely as a bad hack but that's not practical for BC and protocol reasons.
BTW we should be sure the somewhat wacky semantics of multi-statements with embedded commits are documented. I'll check tomorrow.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 23 Dec. 2016 17:53, "Fabien COELHO" <coelho@cri.ensmp.fr> wrote:
Here it is. I'll add this to the next CF.Yes. I'll try to put together a patch and submit it to the next CF.
Great. Please put me (ringerc) down as a reviewer. I'll get to this as soon as holidays settle down. It'd be very useful for some things I'm working on too, and is much better then my early draft of similar functionality.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Craig, > Please put me (ringerc) down as a reviewer. Done. Please do not loose time reviewing stupid version 1... skip to version 2 directly:-) Also, note that the query-location part may be considered separately from the pg_stat_statements part which uses it. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Hello, At Mon, 26 Dec 2016 16:00:57 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612261554510.4911@lancre> > > Hello Craig, > > > Please put me (ringerc) down as a reviewer. > > Done. > > Please do not loose time reviewing stupid version 1... skip to version > 2 directly:-) > > Also, note that the query-location part may be considered separately > from the pg_stat_statements part which uses it. In nonterminal stmtmulti, setQueryLocation is added and used to calcualte and store the length of every stmt's. This needs an extra storage in bse_yy_extra_type and introduces a bit complicated stuff. But I think raw_parser can do that without such extra storage and labor, then gram.y gets far simpler. The struct member to store the location and length is named 'qlocation', and 'qlength' in the added ParseNode. But many nodes already have 'location' of exactly the same purpopse. I don't see a necessity to differentiate them. Putting the two above together, the following is my suggestion for the parser part. - Make all parse nodes have the following two members at the beginning. This unifies all parse node from the view of settingtheir location. Commenting about this would be better. | NodeTag type; | int location; - Remove struct ParseNode and setQueryLocation. stmtmulti sets location in the same manner to other kind of nodes, just doing'= @n'. base_yyparse() doesn't calculate statement length. - Make raw_parser caluclate statement length then store it into each stmt scanning the yyextra.parsetree. What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Hello, At Tue, 27 Dec 2016 10:28:53 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161227.102853.204155513.horiguchi.kyotaro@lab.ntt.co.jp> > Putting the two above together, the following is my suggestion > for the parser part. > > - Make all parse nodes have the following two members at the > beginning. This unifies all parse node from the view of setting > their location. Commenting about this would be better. > > | NodeTag type; > | int location; > > - Remove struct ParseNode and setQueryLocation. stmtmulti sets > location in the same manner to other kind of nodes, just doing > '= @n'. base_yyparse() doesn't calculate statement length. > > - Make raw_parser caluclate statement length then store it into > each stmt scanning the yyextra.parsetree. An additional comment on parser(planner?) part. This patch make planner() copy the location and length from parse to result, but copying such stuffs is a job of standard_planner. Then the following is a comment on pg_stat_statements.c - pg_stat_statements.c:977 - isParseNode(node) node should be parsenode. - The name for the addional parameter query_loc is written as query_len in its prototype. - In pgss_store, "else if (strlen(query)) != query_len)" doesn't work as expected because of one-off error. query_len hereis the length of the query *excluding* the last semicolon. - qtext_store doesn't rely on terminating nul character and the function already takes query length as a parameter. So, there'sno need to copy the partial debug_query_string into palloc'ed memory. Just replacing the query with query_loc willbe sufficient. Have a nice holidays. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Kyotaro-san, > In nonterminal stmtmulti, setQueryLocation is added and used to > calcualte and store the length of every stmt's. This needs an > extra storage in bse_yy_extra_type The extra storage is one int. > and introduces a bit complicated stuff. But I think raw_parser can do > that without such extra storage and labor, How? The issue is that stmtmulti silently skip some ';' when empty statements are found, so I need to keep track of that to know where to stop, using the beginning location of the next statement, which is probably your idea, does not work. > then gram.y gets far simpler. > The struct member to store the location and length is named > 'qlocation', and 'qlength' in the added ParseNode. But many nodes > already have 'location' of exactly the same purpopse. I don't see > a necessity to differentiate them. Alas, the "CreateTableSpaceStmt" struct already had a "char *location" that I did not want to change... If I use "location", then I have to change it, why not... Another reason for the name difference is that qlocation/qlength convention is slightly different from location when not set: location is set manually to -1 when the information is not available, but this convention is not possible for statements without changing all their allocations and initializations (there are hundreds...), so the convention I used for qlocation/qlength is that it is not set if qlength is zero, which is the default value set by makeNode, so no change was needed... Changing this point would mean a *lot* of possibly error prone changes in the parser code everywhere statements are allocated... > Putting the two above together, the following is my suggestion > for the parser part. > > - Make all parse nodes have the following two members at the > beginning. This unifies all parse node from the view of setting > their location. Commenting about this would be better. > > | NodeTag type; > | int location; Currently only a few parse nodes have locations, those about variables and constants that can be designated by an error message. I added the information to about 100 statements, but for the many remaining parse nodes the information does not exist and is not needed, so I would prefer to avoid adding a field both unset and unused. > - Remove struct ParseNode "ParseNode" is needed to access the location and length of statements, otherwise they are only "Node", which does not have a location. > and setQueryLocation. The function is called twice, I wanted to avoid duplicating code. > stmtmulti sets location in the same manner to other kind of nodes, just > doing '= @n'. base_yyparse() doesn't calculate statement length. But... > - Make raw_parser caluclate statement length then store it into > each stmt scanning the yyextra.parsetree. ... I cannot calculate the statement length cleanly because of empty statements. If I know where the last ';' is, then the length computation must be when the reduction occurs, hence the function called from the stmtmulti rule. > What do you think about this? I think that I do not know how to compute the length without knowing where the last ';' was, because of how empty statements are silently skipped around the stmtmulti/stmt rules, so I think that the length computation should stay where it is. What I can certainly do is: - add more comments to explain why it is done like that What I could do with some inputs from reviewers/committers is: - rename the "char *location" field of the create table space statement to "directory" or something else... but what? - change qlocation/qlength to location/length... However, then the -1 convention for location not set is not true, which is annoying. Using this convention means hundredsof changes because every statement allocation & initialization must be changed. This is obviously possible, butis a much larger patch, which I cannot say would be much better than this one, it would just be different. What I'm reserved about: - adding a location field to nodes that do not need it. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> An additional comment on parser(planner?) part. > > This patch make planner() copy the location and length from > parse to result, but copying such stuffs is a job of > standard_planner. I put the copy in planner because standard_planner may not be called by planner, and in all cases I think that the fields should be copied... Otherwise it is the responsability of the planner hook to do the copy, it would is ok if the planner hook calls standard_planner, but the fact is not warranted, the comments says "the plugin would normally call standard_planner". What if it does not? So it seemed safer this way. > Then the following is a comment on pg_stat_statements.c > > - pg_stat_statements.c:977 - isParseNode(node) > node should be parsenode. Could be. ParseNode is a Node, it is just a pointer cast. I can assert on pn instead of node. > - The name for the addional parameter query_loc is written as > query_len in its prototype. Indeed, a typo on "generate_normalized_query" prototype. > - In pgss_store, "else if (strlen(query)) != query_len)" doesn't > work as expected because of one-off error. query_len here is > the length of the query *excluding* the last semicolon. It was somehow intentional: if the query is not the same as the string, then a copy is performed to have the right null-terminated string... but > - qtext_store doesn't rely on terminating nul character and the > function already takes query length as a parameter. So, there's > no need to copy the partial debug_query_string into palloc'ed > memory. Just replacing the query with query_loc will be > sufficient. Hmmm... it seemed good so I tried it, and it did not work... The subtle reason is that qtext_store does assume that the query string is null terminated... it just does not recompute the length its length. However it can be changed to behave correctly in this case, so as to avoid the copy. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: > How? The issue is that stmtmulti silently skip some ';' when empty > statements are found, so I need to keep track of that to know where to > stop, using the beginning location of the next statement, which is > probably your idea, does not work. Maybe you should undo that. I've generally found that trying to put optimizations into the grammar is a bad idea; it's better to KISS in gram.y and apply optimizations later on. >> - Make all parse nodes have the following two members at the >> beginning. This unifies all parse node from the view of setting >> their location. Commenting about this would be better. >> >> | NodeTag type; >> | int location; I do not like this. You are making what should be a small patch into a giant, invasive one. I'd think about adding a single parse node type that corresponds to "stmt" in the grammar and really doesn't do anything but carry the location info for the overall statement. That info could then be transferred into the resulting Query node. Problem solved with (I think) very little code touched. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Tom, >> How? The issue is that stmtmulti silently skip some ';' when empty >> statements are found, [...] > > Maybe you should undo that. I was trying to be minimally invasive in the parser, i.e. not to change any rules. > I've generally found that trying to put optimizations into the grammar > is a bad idea; it's better to KISS in gram.y and apply optimizations > later on. I can change rules, but I'm not sure it will be better. It would allow to remove the last ';' location in the lexer which Kyotar does not like, but it would add some new stretching to compute the length of statements and remove the empty statements. I would say that it would be different, but not necessarily better. >>> - Make all parse nodes have the following two members at the >>> beginning. This unifies all parse node from the view of setting >>> their location. Commenting about this would be better. >>> >>> | NodeTag type; >>> | int location; > > I do not like this. You are making what should be a small patch into > a giant, invasive one. I would not say that the current patch is giant & invasive, if you abstract the two added fields to high-level statements. > I'd think about adding a single parse node type that corresponds to > "stmt" in the grammar and really doesn't do anything but carry the > location info for the overall statement. That info could then be > transferred into the resulting Query node. I'm not sure I understand your suggestion. Currently I have added the location & length information to all high-level statements nodes, plus query and planned. I think that planned is necessary for utility statements. I understand that you suggest to add a new intermediate structure: typedef struct { NodeTag tag; int location, length; Node *stmt; } ParseNode; So that raw_parser would return a List<ParseNode> instead of a List<Node>, and the statements would be unmodified. > Problem solved with (I think) very little code touched. Hmmm... Then raw_parser() callers have to manage a List<ParseNode> instead of the List<Node>, I'm not sure of the size of the propagation to manage the added indirection level appropriately: raw_parser is called 4 times (in parser, tcop, commands, plpgsql...). The first call in tcop is copyObject(), equal(), check_log_statement(), errdetail_execute()... Probably it can be done, but I'm moderately unthousiastic: it looks like starting unwinding a ball of wool, and I'm not sure where it would stop, as it means touching at least the 4 uses of raw_parser in 4 distinct part of postgres, whereas the current approach did not change anything for raw_parser callers, although at the price of modifying all statement nodes. Did I read you correctly? -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> How? The issue is that stmtmulti silently skip some ';' when empty >>> statements are found, [...] >> Maybe you should undo that. > I was trying to be minimally invasive in the parser, i.e. not to change > any rules. That's fairly silly when the alternative is to be maximally invasive at the parse-nodes level, thereby affecting lots and lots of code that isn't really related to the problem at hand. >> I do not like this. You are making what should be a small patch into >> a giant, invasive one. > I would not say that the current patch is giant & invasive, if you > abstract the two added fields to high-level statements. It's touching every single utility statement type, which is not only pretty invasive in itself, but will break any pending patches that add more utility statement types. And you've not followed through on the implications of adding those fields in, for instance, src/backend/nodes/; so the patch will be even bigger once all the required tidying-up is done. > I understand that you suggest to add a new intermediate structure: > typedef struct { > NodeTag tag; > int location, length; > Node *stmt; > } ParseNode; > So that raw_parser would return a List<ParseNode> instead of a List<Node>, > and the statements would be unmodified. Yeah, that's what I was thinking of. There aren't very many places that would need to know about that, I believe; possibly just gram.y itself and transformTopLevelStmt(). The stuff in between only knows that it's passing around lists of statement parse trees, it doesn't know much about what's in those trees. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 28 December 2016 at 08:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I would not say that the current patch is giant & invasive, if you >> abstract the two added fields to high-level statements. > > It's touching every single utility statement type, which is not only > pretty invasive in itself, but will break any pending patches that > add more utility statement types. And you've not followed through on the > implications of adding those fields in, for instance, src/backend/nodes/; > so the patch will be even bigger once all the required tidying-up is done. I echo Tom's complaint. Adding fields to every node is definitely not the way to do this. >> I understand that you suggest to add a new intermediate structure: > >> typedef struct { >> NodeTag tag; >> int location, length; >> Node *stmt; >> } ParseNode; > >> So that raw_parser would return a List<ParseNode> instead of a List<Node>, >> and the statements would be unmodified. > > Yeah, that's what I was thinking of. I strongly agree. That's what I'd done in my draft patch, but I hadn't got the details of actually collecting position tracking info sorted out yet. The information will need to be copied to plans later, but that's easy, and would've been necessary anyway. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Tom, > [...] It's touching every single utility statement type, which is not > only pretty invasive in itself, but will break any pending patches that > add more utility statement types. Yep, but it is limited to headers and the break is trivial... > And you've not followed through on the implications of adding those > fields in, for instance, src/backend/nodes/; Hmmm, probably you are pointing out structure read/write functions. I would have hoped that such code would be automatically derived from the corresponding struct definition... >> I understand that you suggest to add a new intermediate structure [...] >> So that raw_parser would return a List<ParseNode> instead of a >> List<Node>, and the statements would be unmodified. > > Yeah, that's what I was thinking of. There aren't very many places that > would need to know about that, I believe; [...] Hmmm. I've run into a tiny small ever so little detail in trying to apply this clever approach... For fixing the information in pg_stat_statement, the location data must be transported from the parsed node to the query to the planned node, because the later two nodes types are passed to different hooks. Now the detail is that utility statements, which seems to be nearly all of them but select/update/delete/insert, do not have plans: The statement itself is its own plan... so there is no place to store the location & length. As adding such fields to store the information in the structures is no go area, I'm hesitating about the course to follow to provide a solution acceptable to you. Would you have any other clever proposition or advice about how to proceed? I thought of creating yet another node "utility plans with locations" or maybe reuse the "parsed node" for the purpose, but then it has to be managed in quite a few places as well. For the parser output, there is 4 raw_parser calls but also 10 pg_parse_query calls to manage. I'm not sure that going this way will fit the "few lines of code" bill... -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 28 December 2016 at 20:11, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Tom, > >> [...] It's touching every single utility statement type, which is not only >> pretty invasive in itself, but will break any pending patches that add more >> utility statement types. > > > Yep, but it is limited to headers and the break is trivial... I'm not sure how else that part can be done. We can add a common parent for all utility statement types, but that also touches all utility statements, and since we're working with pure C, it means 'type' won't be accessible as e.g. someCreateStmt.type, but only as something like ((UtilityStatement)someCreateStmt).type or someCreateStmt.hdr.type . e.g. typedef struct UtilityStatement { NodeTag type; int querytext_offset; int querytext_length; }; typedef struct CreateStmt { UtilityStatement hdr; RangeVar *relation; /* relation to create */ .... } I don't think that's so bad, but it's not exactly less invasive. >> And you've not followed through on the implications of adding those fields >> in, for instance, src/backend/nodes/; > > Hmmm, probably you are pointing out structure read/write functions. I'm sure that's his intent, yes. > I would have hoped that such code would be automatically derived from the > corresponding struct definition... That would be nice. But no, unfortunately, we have no such preprocessing. To do so would require that we either generate things like parsenodes.h and the relevant parts of copyfuncs.c etc from a common source, or have a toolchain that can ingest parsenodes.h and emit suitable code. The latter means either "fragile, hideous Perl script" or requiring something like LLVM or the gcc frontend libraries in the toolchain to parse the C code and emit an intermediate representation of the structures that we could then generate our functions from. If you've come from the Java world you'd expect that we'd just generate the copy and makenode stuff from introspection of the node structs, or preprocess them or something, but it's not very practical. They don't change enough to really make it worth it either. > Hmmm. I've run into a tiny small ever so little detail in trying to apply > this clever approach... > > For fixing the information in pg_stat_statement, the location data must be > transported from the parsed node to the query to the planned node, because > the later two nodes types are passed to different hooks. Yep. > Now the detail is that utility statements, which seems to be nearly all of > them but select/update/delete/insert, do not have plans: The statement > itself is its own plan... so there is no place to store the location & > length. Yeah. I don't see any way around adding location info for utility statements one way or the other. We could wrap every utility statement up in a container with the location info, but that'd break every user of ProcessUtility_hook and unless we put the container as the first by-value member of each utility struct anyway, it'd mean extra allocations for each utility statement. Not very appealing. TBH I think that for utility statements, adding a member struct with location info is the least-bad option. Something like: /** Location of the original querytext of an individual parsetree or plan relative to the* start of the querytext. This is usually offset 0 and the length of the querytext its self,* but in multi-statements and other cases where a single parse has multiple products* hooks may need to know which part of the querytext corresponds to which product* plan.** offset and length are both -1 for plans with no corresponding querytext, such as* additional/replacement queries emitted by the rewrite phase or additional statements* produced as a result of processing some utility statements.* typedef struct StmtLocation { int offset; int length; } typedef struct CreateStmt { NodeTag type; StmtLocation location; .... } Personally I don't much care if we do it this way, or by embedding the NodeTag too and creating a common UtilityStmt. The latter probably has benefits for later extension, but is a bit more annoying in the shorter term. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Craig, >>> [...] It's touching every single utility statement type, which is not >>> only pretty invasive in itself, but will break any pending patches >>> that add more utility statement types. >> >> Yep, but it is limited to headers and the break is trivial... > > I'm not sure how else that part can be done. > > We can add a common parent for all utility statement types, but that > also touches all utility statements, and since we're working with pure > C, it means 'type' won't be accessible as e.g. someCreateStmt.type, but > only as something like ((UtilityStatement)someCreateStmt).type or > someCreateStmt.hdr.type. [...] I don't think that's so bad, but it's not > exactly less invasive. I thought of this way of implementing that on the submitted version, I decided against because then it is less obvious what is directly in the structure, existing code may reference the tag field, and the number of header changes is the same, if a little less verbose. >> [...] For fixing the information in pg_stat_statement, the location >> data must be transported from the parsed node to the query to the >> planned node, because the later two nodes types are passed to different >> hooks. > > Yep. > >> Now the detail is that utility statements, which seems to be nearly all of >> them but select/update/delete/insert, do not have plans: The statement >> itself is its own plan... so there is no place to store the location & >> length. > > Yeah. I don't see any way around adding location info for utility > statements one way or the other. If utility statements are done this way, that's 95% of all statements, so the point of doing the remaining 5% with Tom's neat intermediate node trick seems void, I think that it is better to have all of them the same way. > TBH I think that for utility statements, adding a member struct with > location info is the least-bad option. Something like: > typedef struct StmtLocation > { > int offset; > int length; > } It is possible, but: > typedef struct CreateStmt > { > NodeTag type; > StmtLocation location; > .... > } Name "location" is already used meaning the offset or the directory destination for create table space, that would create a third option. For homogeneity, ISTM that keeping location & length directly and renaming the table space location is the simplest & most homogeneous option. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello, >> Yeah, that's what I was thinking of. There aren't very many places that >> would need to know about that, I believe; [...] > > For fixing the information in pg_stat_statement, the location data must be > transported from the parsed node to the query to the planned node, because > the later two nodes types are passed to different hooks. > > Now the detail is that utility statements, which seems to be nearly all of > them but select/update/delete/insert, do not have plans: The statement itself > is its own plan... so there is no place to store the location & length. Here is an updated version: Changes wrt v2: - I have added the missing stuff under /nodes/, this is stupid code that should be automatically generated:-( - I have added comments in "gram.y" about how the length is computed. I have also slightly simplified the rule code there. - I have rename "location" in create table space to "location_dir" to avoid confusion. - I have renamed the fields "location" and "length" instead of q*. - I have moved the location & lenth copies in standard_planner. - I have fixed the function declaration typo. - I have simplified pgss_store code to avoid a copy, and move the length truncation in qtext_store. - I have improved again the pg_stat_statement regression tests with combined utility statement tests, which implied some fixes to extract the right substring for utility queries. However, not changed: - I cannot use the intermediate node trick suggested by Tom because it does not work for utility statements which do not have plans, so the code still adds location & length, sorry. - I still use the 'last_semicolon' lexer variable. The alternative is to change rules so as not to skip empty statements, then write a loop to compute the length based on successor location, and remove the empty statements. It can be done, I do not think it is better, it is only different and more verbose. I'll do it if required by a committer. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Hello, At Fri, 30 Dec 2016 15:10:42 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612301453280.32017@lancre> > > Hello, > > >> Yeah, that's what I was thinking of. There aren't very many places > >> that > >> would need to know about that, I believe; [...] > > > > For fixing the information in pg_stat_statement, the location data > > must be transported from the parsed node to the query to the planned > > node, because the later two nodes types are passed to different hooks. > > > > Now the detail is that utility statements, which seems to be nearly > > all of them but select/update/delete/insert, do not have plans: The > > statement itself is its own plan... so there is no place to store the > > location & length. > > Here is an updated version: > > Changes wrt v2: > > - I have added the missing stuff under /nodes/, this is stupid code that > should be automatically generated:-( > > - I have added comments in "gram.y" about how the length is computed. > I have also slightly simplified the rule code there. > > - I have rename "location" in create table space to "location_dir" > to avoid confusion. > > - I have renamed the fields "location" and "length" instead of q*. > > - I have moved the location & lenth copies in standard_planner. > > - I have fixed the function declaration typo. > > - I have simplified pgss_store code to avoid a copy, and move the > length truncation in qtext_store. > > - I have improved again the pg_stat_statement regression tests with > combined utility statement tests, which implied some fixes to > extract the right substring for utility queries. > > However, not changed: > > - I cannot use the intermediate node trick suggested by Tom because > it does not work for utility statements which do not have plans, so > the code still adds location & length, sorry. Wrapping the output of pg_parse_query still works for non-utility commands. But pg_stat_statement retrieves the location information via ProcessUtility, which has plantree. Wrapping plantree with the same struct also works but it imacts too widely and extremely error-prone. One available solution is, as Fabien did, bring location and length data along with transformation. And anther is give a chopped query to query tree. The attached patch does the second way (but quite ugly and incomplete, it's PoC). This seems working as expected to a certain degree but somewhat bogus.. A significant drawback of this is that this makes a copy of each query in multistatement. > - I still use the 'last_semicolon' lexer variable. The alternative is to > change rules so as not to skip empty statements, then write a loop to > compute the length based on successor location, and remove the empty > statements. It can be done, I do not think it is better, it is only > different and more verbose. I'll do it if required by a committer. I think this is doable in the way shown in this patch. But this seems somewhat bogus, too.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > At Fri, 30 Dec 2016 15:10:42 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612301453280.32017@lancre> >> - I cannot use the intermediate node trick suggested by Tom because >> it does not work for utility statements which do not have plans, so >> the code still adds location & length, sorry. I still really, really don't like this patch. I think it's going to create continuing maintenance problems, because of shortcuts like this: +#define isParseNodeTag(tag) ((T_Query <= (tag)) && ((tag) < T_A_Expr)) We've never had any code that depends on ranges of nodetag numbers, and now is not the time to start. (The fact that you had to randomly relocate some existing tags to make this work didn't make me any happier about it.) >> - I still use the 'last_semicolon' lexer variable. The alternative is to >> change rules so as not to skip empty statements, then write a loop to >> compute the length based on successor location, and remove the empty >> statements. It can be done, I do not think it is better, it is only >> different and more verbose. I'll do it if required by a committer. > I think this is doable in the way shown in this patch. But this > seems somewhat bogus, too.. Yeah, I doubt that this technique for getting the raw locations in the grammar+lexer works reliably. In general, Bison is a bit asynchronous in terms of scanning, and may or may not have scanned a token ahead of what the current production covers. So having production rules that look at the scanner state is just dangerous. You should be relying on the Bison locations mechanism (@N), instead. That has some gotchas of its own with respect to locations of nonterminals, but at least they're known --- and we could fix them if we had to. Anyway, I decided to see what it would take to do it the way I had in mind, which was to stick a separate RawStmt node atop each statement parsetree. The project turned out a bit larger than I hoped :-(, but I'm really pleased with the end result, because it makes the APIs around lists of statements much simpler and more regular than they used to be. I would probably propose most of the attached patch for adoption even if we weren't interested in tracking statement boundaries. In brief, what this does is: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. This is similar to the existing rule that the output of parse analysis is always a list of Query nodes, even though the Query struct is mostly empty in the CMD_UTILITY case. Furthermore, the output of pg_plan_queries is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. * This allows us to keep statement location/length in RawStmt, Query, and PlannedStmt nodes, and not anywhere else. * This results in touching quite a bit of code that is concerned with lists of statements in different formats, but in most places, it's simple changes like replacing generic Node* pointers with specific RawStmt* or PlannedStmt* pointers. I think that's an unalloyed good. * To preserve the new rule that the node passed to top-level parse_analyze() is now always a RawStmt, I made the grammar insert RawStmts in the sub-statements of EXPLAIN, DECLARE CURSOR, etc. This adds a bit of cruft in gram.y but avoids cruft elsewhere. * In addition, I had to change the API of ProcessUtility() so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This is what allows pg_stat_statements to get at the location info for a utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial for those that don't care about location info --- see the changes to contrib/sepgsql/hooks.c for an example. * Because PlannedStmt also carries a canSetTag field, we're able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement (specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list). While I see no near-term need for relaxing that, it's nice to get rid of the ad-hocery. * I chose to also rearrange the post-parse-analysis representation of DECLARE CURSOR so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. Possibly this patch could have been made to work without that, but the existing code had some weird rules about how the presence of a PlannedStmt where a utility command was expected meant it was DECLARE CURSOR, and I was out to get rid of weird rules like that one. With these changes, it's true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a whole lot of places that were testing both fields --- I think a few of those were just defensive programming, but in a lot of them, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT. So this ends up costing one extra palloc per statement, or two extra in the case of a utility statement, but really that's quite negligible compared to everything else that happens in processing a statement. As against that, I think it makes the related code a lot clearer. The sheer size of the patch is a bit more than Fabien's patch, but what it is touching is not per-statement-type code but code that works generically with lists of statements. So I think it's less likely to cause problems for other patches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
Hello Tom, > Yeah, I doubt that this technique for getting the raw locations in the > grammar+lexer works reliably. It worked reliably on all tests, and the assumption seemed sound to me, given the one-look-ahead property and that statement reductions can only triggered when ';' or end of input comes. > Anyway, I decided to see what it would take to do it the way I had > in mind, which was to stick a separate RawStmt node atop each statement > parsetree. The project turned out a bit larger than I hoped :-(, Indeed: I tried it as it looked elegant, but it did not show to be the simple thing you announced... > [...] Furthermore, the output of pg_plan_queries is now always a list of > PlannedStmt nodes, even for utility statements. I stopped exactly there when I tried: I thought that changing that would be enough to get a reject because it would be considered much too invasive in too many places for fixing a small bug. > So this ends up costing one extra palloc per statement, or two extra > in the case of a utility statement, but really that's quite negligible > compared to everything else that happens in processing a statement. > As against that, I think it makes the related code a lot clearer. Having spent some time there, I agree that a refactoring and cleanup was somehow needed... > The sheer size of the patch is a bit more than Fabien's patch, Yep, nearly twice as big and significantly more complex, but the result is a definite improvement. > but what it is touching is not per-statement-type code but code that > works generically with lists of statements. So I think it's less likely > to cause problems for other patches. Patch applies (with "patch", "git apply" did not like it though), compiles, overall make check ok, pgss make check ok as well. I do not know whether the coverage of pg tests is enough to know whether anything is broken... I noticed that you also improved the pgss fix and tests. Good. I was unsure about removing spaces at the end of the query because of possible encoding issues. The update end trick is nice to deal with empty statements. I wonder whether the sub "query" in Copy, View, CreateAs, Explain, Prepare, Execute, Declare... could be typed RawStmt* instead of Node*. That would avoid some casts in updateRawStmtEnd, but maybe add some other elsewhere. One point bothered me a bit when looking at the initial code, that your refactoring has not changed: the location is about a string which is not accessible from the node, so that the string must be kept elsewhere and passed as a second argument here and there. ISTM that it would make some sense to have the initial string directly available from RawStmt, Query and PlannedStmt. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> [...] Furthermore, the output of pg_plan_queries is now always a list of >> PlannedStmt nodes, even for utility statements. > I stopped exactly there when I tried: I thought that changing that would > be enough to get a reject because it would be considered much too invasive > in too many places for fixing a small bug. Well, I don't know that we'd have bothered to change this without a reason, but now that the effort's been put in, it seems a lot cleaner to me. > I noticed that you also improved the pgss fix and tests. Good. I was > unsure about removing spaces at the end of the query because of possible > encoding issues. It's safe in backend-legal encodings. I agree it could be problematic in, say, SJIS, but we disallow that as a backend encoding precisely because of this type of hazard. > The update end trick is nice to deal with empty statements. I wonder > whether the sub "query" in Copy, View, CreateAs, Explain, Prepare, > Execute, Declare... could be typed RawStmt* instead of Node*. I did that for some of them, eg CopyStmt. The trouble for statements like EXPLAIN is that we replace the sub-statement during parse analysis, so that the field points to either a raw grammar output node or to a Query depending on when you look. If we wanted to get rid of using a Node* field there, we'd have to invent two separate struct types to represent raw EXPLAIN and post-parse-analyze EXPLAIN (analogously to the way that, say, ColumnRef and Var are distinct node types). It doesn't seem worth the trouble to me. > One point bothered me a bit when looking at the initial code, that your > refactoring has not changed: the location is about a string which is not > accessible from the node, so that the string must be kept elsewhere and > passed as a second argument here and there. ISTM that it would make some > sense to have the initial string directly available from RawStmt, Query > and PlannedStmt. Yeah, perhaps, but that would be something to tackle as a separate round of changes I think. This might also tie into the nearby thread about having that string available to pass to parallel workers. The history here is that we didn't use to keep the source string around after parsing, but that's not the case anymore. But we've not entirely accepted that in all the internal APIs ... regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
I wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: >> One point bothered me a bit when looking at the initial code, that your >> refactoring has not changed: the location is about a string which is not >> accessible from the node, so that the string must be kept elsewhere and >> passed as a second argument here and there. ISTM that it would make some >> sense to have the initial string directly available from RawStmt, Query >> and PlannedStmt. > Yeah, perhaps, but that would be something to tackle as a separate round > of changes I think. I remembered one reason why we haven't done this: it's unclear how we'd handle copying if we do it. If, say, Query contains a "char *" pointer then you'd expect copyObject() to pstrdup that string, but we really don't want that to happen in scenarios where a single source string generated a large number of Query trees. That's not an academic concern, we've gotten real field bug reports when we broke it --- cf commit 1591fcbec. We'd need to work out a way of managing multiple Queries carrying references to the same source string, and it's not clear how to do that reasonably. So for now, it remains the path of least resistance for Portals and CachedPlans to contain a list of Query or PlannedStmt objects and a separate source string. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> [...] it seems a lot cleaner to me. I agree. > [...] It's safe in backend-legal encodings. Good:-) > [...] The trouble for statements like EXPLAIN is that we replace the > sub-statement during parse analysis, [...] Ok. So Node* is fine to keep it generic. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
About having a pointer to the initial string from RawStmt, Query & PlannedStmt: > I remembered one reason why we haven't done this: it's unclear how we'd > handle copying if we do it. If, say, Query contains a "char *" pointer > then you'd expect copyObject() to pstrdup that string, [..., So] We'd > need to work out a way of managing multiple Queries carrying references > to the same source string, and it's not clear how to do that reasonably. For me it would be shared, but then it may break some memory management hypothesis downstream. > So for now, it remains the path of least resistance for Portals and > CachedPlans to contain a list of Query or PlannedStmt objects and a > separate source string. Ok. Thanks for the explanation. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Hello, It is big, but I think quite reasonable. The refactoring makes many things clearer and I like it. No objection for the patch. The affect of changing some API's are not a problem. At Thu, 12 Jan 2017 19:00:47 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1701121752430.3788@lancre> > Patch applies (with "patch", "git apply" did not like it though), > compiles, overall make check ok, pgss make check ok as well. I do not > know whether the coverage of pg tests is enough to know whether > anything is broken... The same for me. There's some parts that I haven't fully understand, though.. > I noticed that you also improved the pgss fix and tests. Good. I was > unsure about removing spaces at the end of the query because of > possible encoding issues. > > The update end trick is nice to deal with empty statements. I wonder > whether the sub "query" in Copy, View, CreateAs, Explain, Prepare, > Execute, Declare... could be typed RawStmt* instead of Node*. That > would avoid some casts in updateRawStmtEnd, but maybe add some other > elsewhere. > > One point bothered me a bit when looking at the initial code, that > your refactoring has not changed: the location is about a string which > is not accessible from the node, so that the string must be kept > elsewhere and passed as a second argument here and there. ISTM that it > would make some sense to have the initial string directly available > from RawStmt, Query and PlannedStmt. I found an inconsistency (in style, not function:) between copyfuncs and equalfuncs. The patch handles the three members utilityStmt, stmt_location and stmt_len of Query at once and copyfuncs seems to be edited to follow the rule, but in equalfuncs the position of utilityStmt is not moved. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Hello, At Thu, 12 Jan 2017 20:08:54 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1701122004190.3788@lancre> > > About having a pointer to the initial string from RawStmt, Query & > PlannedStmt: > > > I remembered one reason why we haven't done this: it's unclear how > > we'd handle copying if we do it. If, say, Query contains a "char *" > > pointer then you'd expect copyObject() to pstrdup that string, [..., > > So] We'd need to work out a way of managing multiple Queries carrying > > references to the same source string, and it's not clear how to do > > that reasonably. > > For me it would be shared, but then it may break some memory > management hypothesis downstream. +1 to they have a pointer to the shared query string. But doing that without some measure like reference counting seems difficult.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > I found an inconsistency (in style, not function:) between > copyfuncs and equalfuncs. The patch handles the three members > utilityStmt, stmt_location and stmt_len of Query at once and > copyfuncs seems to be edited to follow the rule, but in > equalfuncs the position of utilityStmt is not moved. I think you're looking at struct Query's instance of utilityStmt. I didn't move that one. (Maybe I should've, but it didn't look quite as randomly dropped into the middle of the struct as the PlannedStmt one did. Just a judgment call in either case.) regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 13 Jan. 2017 16:35, "Kyotaro HORIGUCHI" <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,
At Thu, 12 Jan 2017 20:08:54 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1701122004190.3788@lancre> >+1 to they have a pointer to the shared query string. But doing
> About having a pointer to the initial string from RawStmt, Query &
> PlannedStmt:
>
> > I remembered one reason why we haven't done this: it's unclear how
> > we'd handle copying if we do it. If, say, Query contains a "char *"
> > pointer then you'd expect copyObject() to pstrdup that string, [...,
> > So] We'd need to work out a way of managing multiple Queries carrying
> > references to the same source string, and it's not clear how to do
> > that reasonably.
>
> For me it would be shared, but then it may break some memory
> management hypothesis downstream.
that without some measure like reference counting seems
difficult..
Sounds like it'd be better as a separate change so as not to block this one.
I really like what you have done Tom, though I'm about to travel so I haven't read it in full detail. Like Fabien I would've been certain that it'd be rejected if I tried it, but I sure am glad you did it.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > Sounds like it'd be better as a separate change so as not to block this one. Yeah, if we do that at all it should be a separate patch IMO. The concerns are largely different. > I really like what you have done Tom, though I'm about to travel so I > haven't read it in full detail. Like Fabien I would've been certain that > it'd be rejected if I tried it, but I sure am glad you did it. Does anyone want to do further review on this before I push it? One thing that I'm not quite satisfied with is the business with non-top-level RawStmt nodes in some utility statements. That's certainly a wart from gram.y's perspective, and it's mostly a wart from analyze.c's perspective as well --- the parse analyze routines mostly just throw away the non-top-level RawStmt. The original reason for doing it was that DoCopy needs to call pg_analyze_and_rewrite() on the sub-statement, and I wanted pg_analyze_and_rewrite's argument to be declared as RawStmt, so CopyStmt's sub-statement had to have a RawStmt. And it seemed like a good idea for the other utility statements to be consistent with that. However, when all the dust settled, DoCopy ended up needing to construct a RawStmt node anyway for the case where it's inventing a Query tree to support an RLS query. We could make it construct a RawStmt node in the plain copy-from-query case too, and then there needn't be one in the grammar output. So I'm now thinking that it might be better if the grammar produced RawStmt only at top level, and anybody who calls pg_analyze_and_rewrite on sub-sections of a utility statement has to cons up a RawStmt to put at the top of the sub-query. This complicates life a little for utility statements that call parse analysis during execution, but the principle that RawStmt appears only at top level seems attractively simple. We don't nest PlannedStmts either. One thing that connects into this is why do we have some utility statements that do parse analysis of sub-statements immediately during their own parse analysis, while others postpone that to execution? I've not researched it yet, but my vague memory is that EXPLAIN and friends used to all do it in the second way, and we realized that that was broken so we changed them to the first way. COPY has only recently grown the ability to have a sub-query, and I'm wondering if it's just a failure of institutional memory that led us to do it the second way. If we ended up requiring all these cases to work more like EXPLAIN, they'd not be needing to construct RawStmts at execution anyway. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> One thing that I'm not quite satisfied with is the business with > non-top-level RawStmt nodes in some utility statements. > a wart from gram.y's perspective, and it's mostly a wart from analyze.c's > perspective as well --- the parse analyze routines mostly just throw away > the non-top-level RawStmt. > > The original reason for doing it was that DoCopy needs to call > pg_analyze_and_rewrite() on the sub-statement, and I wanted > pg_analyze_and_rewrite's argument to be declared as RawStmt, My 0,02€, feel free to ignore them: Personnaly when I had started doing a version I had decided to only change the type at top level, and then I made a few functions being resilient about having a RawStmt (that I had called ParsedStmt in my version) or a direct statement, rather than change the type, I had kept Node*. Now I see the benefit of changing the type, because the compiler will say if there is an issue, and their is no hidden level changes. > So I'm now thinking that it might be better if the grammar produced > RawStmt only at top level, and anybody who calls pg_analyze_and_rewrite > on sub-sections of a utility statement has to cons up a RawStmt to put > at the top of the sub-query. Why not. The lazy programmer I am notices that there seems to be 6 instances, this is not too bad, some of which are already dealt with. The RawStmt may not need to be allocated dynamically, a stack instance could be enough. -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> One thing that I'm not quite satisfied with is the business with >> non-top-level RawStmt nodes in some utility statements. >> ... >> So I'm now thinking that it might be better if the grammar produced >> RawStmt only at top level, and anybody who calls pg_analyze_and_rewrite >> on sub-sections of a utility statement has to cons up a RawStmt to put >> at the top of the sub-query. > Why not. The lazy programmer I am notices that there seems to be 6 > instances, this is not too bad, some of which are already dealt with. The > RawStmt may not need to be allocated dynamically, a stack instance could > be enough. Here's a v2 that does it like that. It ends up being about 30 fewer lines of code overall, despite there being four places that have to make ad-hoc RawStmt nodes. On the whole I feel like this is cleaner, although there's room to argue that. Still, the extra cruft is in places that I'm suspicious are wrong anyway, as I noted. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Fabien COELHO
Date:
> Here's a v2 that does it like that. Applies, compiles, overall & pgss make check are both ok. Personnally I find having RawStmt only for the top parsed statement cleaner, and it significantly reduces changes in the parser, as expected, without inducing too much changes elsewhere. Same comment as on v1: I'm wondering about the actual coverage of the default tests... I cannot say I understand everything. > It ends up being about 30 fewer lines of code overall, despite there > being four places that have to make ad-hoc RawStmt nodes. On the whole > I feel like this is cleaner, I agree: Better typing, more homogeneous code (PlannedStmt for all), less ad-hoc checks to work around utility statements... -- Fabien.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> It ends up being about 30 fewer lines of code overall, despite there >> being four places that have to make ad-hoc RawStmt nodes. On the whole >> I feel like this is cleaner, > I agree: Better typing, more homogeneous code (PlannedStmt for all), > less ad-hoc checks to work around utility statements... OK, pushed like that. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 15 January 2017 at 05:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> It ends up being about 30 fewer lines of code overall, despite there >>> being four places that have to make ad-hoc RawStmt nodes. On the whole >>> I feel like this is cleaner, > >> I agree: Better typing, more homogeneous code (PlannedStmt for all), >> less ad-hoc checks to work around utility statements... > > OK, pushed like that. Thanks very much for that Tom, it's great to see this change. One suggestion: it's currently non-obvious that ProcessUtility_hook gets called with the full text of all parts of a multi-statement. I suggest the following wording added to the comment on ProcessUtility() in src/backend/tcop/utility.c, after the note on the query string, or something like it: The same query string may be passed to multiple invocations of ProcessUtility if a utility statement in turn invokes otherutility statements, or if the user supplied a query string containing multiple semicolon-separated statements in a singleprotocol message. It is also possible for the query text to contain other non-utility-statement text like comments,empty statements, and plannable statements. Callers that use the queryString should use pstmt->stmt_location andpstmt->stmt_len to extract the text for the statement of interest and should guard against re-entrant invocation. That should help with at least some of the traps around ProcessUtility_hook, and I certainly wish I'd known it some months ago. For the record, this is commits ab1f0c8225714aaa18d2f9ca4f80cd009f145421 and 83f2061dd037477ec8479ee160367840e203a722 . https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ab1f0c8225714aaa18d2f9ca4f80cd009f145421 https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=83f2061dd037477ec8479ee160367840e203a722 -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > One suggestion: it's currently non-obvious that ProcessUtility_hook > gets called with the full text of all parts of a multi-statement. OK, we can improve that ... > The same query string may be passed to multiple invocations of ProcessUtility > if a utility statement in turn invokes other utility statements, or if the > user supplied a query string containing multiple semicolon-separated > statements in a single protocol message. It is also possible for the query > text to contain other non-utility-statement text like comments, empty > statements, and plannable statements. Callers that use the queryString > should use pstmt->stmt_location and pstmt->stmt_len to extract the text for > the statement of interest and should guard against re-entrant invocation. Not sure about the reference to re-entrancy. It's not especially relevant to query texts AFAICS, and wouldn't a utility statement know darn well if it was doing something that could end up invoking another instance of itself? regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 26 January 2017 at 21:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: >> One suggestion: it's currently non-obvious that ProcessUtility_hook >> gets called with the full text of all parts of a multi-statement. > > OK, we can improve that ... > >> The same query string may be passed to multiple invocations of ProcessUtility >> if a utility statement in turn invokes other utility statements, or if the >> user supplied a query string containing multiple semicolon-separated >> statements in a single protocol message. It is also possible for the query >> text to contain other non-utility-statement text like comments, empty >> statements, and plannable statements. Callers that use the queryString >> should use pstmt->stmt_location and pstmt->stmt_len to extract the text for >> the statement of interest and should guard against re-entrant invocation. > > Not sure about the reference to re-entrancy. It's not especially relevant > to query texts AFAICS, and wouldn't a utility statement know darn well if > it was doing something that could end up invoking another instance of > itself? The utility statement does, but the hooks don't necessarily. If you fire an ALTER TABLE ... ADD COLUMN .. ADD COLUMN .. ADD CONSTRAINT ..; for example. However, I was wrong to say we must guard against re-entrancy. We should only enter ProcessUtility once with context == PROCESS_UTILITY_QUERY, and that's what hooks should be looking at rather than keeping track of re-entrant invocations. So perhaps: "The same query string may be passed to multiple invocations of ProcessUtility if a utility statement invokes subcommands (e.g. ALTER TABLE), in which case context will be set to PROCESS_UTILITY_SUBCOMMAND, or if the user supplied a query string containing multiple semicolon-separated statements in a single protocol message. It is also possible for the query text to contain other non-utility-statement text like comments, empty statements, and plannable statements that don't pass through ProcessUtility. Hooks that use the queryString should use pstmt->stmt_location and pstmt->stmt_len to extract the text for the statement of interest and should pay attention to the context to avoid repeatedly handling the same query string in subcommands." -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > So perhaps: > "The same query string may be passed to multiple invocations of > ProcessUtility if a utility statement invokes subcommands (e.g. ALTER > TABLE), in which case context will be set to > PROCESS_UTILITY_SUBCOMMAND, or if the user supplied a query string > containing multiple semicolon-separated statements in a single > protocol message. It is also possible for the query text to contain > other non-utility-statement text like comments, empty statements, and > plannable statements that don't pass through ProcessUtility. Hooks > that use the queryString should use pstmt->stmt_location and > pstmt->stmt_len to extract the text for the statement of interest and > should pay attention to the context to avoid repeatedly handling the > same query string in subcommands." Meh. I really don't think that a comment about the query string is the place to get into promises that are unrelated to the string, and that ProcessUtility is in no position to enforce anyway. How about we just say this: "The same queryString may be passed to multiple invocations of ProcessUtility when processing a query string containing multiple semicolon-separated statements; one should use pstmt->stmt_location and pstmt->stmt_len to identify the substring containing the current statement. Keep in mind also that some utility statements (e.g., CREATE SCHEMA) will recurse to ProcessUtility to process sub-statements, often passing down the same queryString, stmt_location, and stmt_len that were given for the whole statement." regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
At Thu, 26 Jan 2017 22:34:57 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <23778.1485488097@sss.pgh.pa.us> > Craig Ringer <craig.ringer@2ndquadrant.com> writes: > > So perhaps: > > > "The same query string may be passed to multiple invocations of > > ProcessUtility if a utility statement invokes subcommands (e.g. ALTER > > TABLE), in which case context will be set to > > PROCESS_UTILITY_SUBCOMMAND, or if the user supplied a query string > > containing multiple semicolon-separated statements in a single > > protocol message. It is also possible for the query text to contain > > other non-utility-statement text like comments, empty statements, and > > plannable statements that don't pass through ProcessUtility. Hooks > > that use the queryString should use pstmt->stmt_location and > > pstmt->stmt_len to extract the text for the statement of interest and > > should pay attention to the context to avoid repeatedly handling the > > same query string in subcommands." > > Meh. I really don't think that a comment about the query string is > the place to get into promises that are unrelated to the string, and > that ProcessUtility is in no position to enforce anyway. How about > we just say this: > > "The same queryString may be passed to multiple invocations of > ProcessUtility when processing a query string containing multiple > semicolon-separated statements; one should use pstmt->stmt_location and > pstmt->stmt_len to identify the substring containing the current > statement. Keep in mind also that some utility statements (e.g., > CREATE SCHEMA) will recurse to ProcessUtility to process sub-statements, > often passing down the same queryString, stmt_location, and stmt_len > that were given for the whole statement." Tom's rewrite is easier to read for me and seems telling me enough as a user of the hook. By the way the existing comment for the hook, > * > * We provide a function hook variable that lets loadable plugins get > * control when ProcessUtility is called. Such a plugin would normally > * call standard_ProcessUtility(). > */ This is quite a matter of course for developers. planner_hook and other ones don't have such a comment or have a one-liner at most. Since this hook gets a good amount of commnet, it seems better to just remove the existing one.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 27 Jan. 2017 14:34, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig.ringer@2ndquadrant.com> writes:Meh. I really don't think that a comment about the query string is
> So perhaps:
> "The same query string may be passed to multiple invocations of
> ProcessUtility if a utility statement invokes subcommands (e.g. ALTER
> TABLE), in which case context will be set to
> PROCESS_UTILITY_SUBCOMMAND, or if the user supplied a query string
> containing multiple semicolon-separated statements in a single
> protocol message. It is also possible for the query text to contain
> other non-utility-statement text like comments, empty statements, and
> plannable statements that don't pass through ProcessUtility. Hooks
> that use the queryString should use pstmt->stmt_location and
> pstmt->stmt_len to extract the text for the statement of interest and
> should pay attention to the context to avoid repeatedly handling the
> same query string in subcommands."
the place to get into promises that are unrelated to the string, and
that ProcessUtility is in no position to enforce anyway. How about
we just say this:
"The same queryString may be passed to multiple invocations of
ProcessUtility when processing a query string containing multiple
semicolon-separated statements; one should use pstmt->stmt_location and
pstmt->stmt_len to identify the substring containing the current
statement. Keep in mind also that some utility statements (e.g.,
CREATE SCHEMA) will recurse to ProcessUtility to process sub-statements,
often passing down the same queryString, stmt_location, and stmt_len
that were given for the whole statement."
Much better wording. I like that.
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > On 27 Jan. 2017 14:34, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> "The same queryString may be passed to multiple invocations of >> ProcessUtility when processing a query string containing multiple >> semicolon-separated statements; one should use pstmt->stmt_location and >> pstmt->stmt_len to identify the substring containing the current >> statement. Keep in mind also that some utility statements (e.g., >> CREATE SCHEMA) will recurse to ProcessUtility to process sub-statements, >> often passing down the same queryString, stmt_location, and stmt_len >> that were given for the whole statement." > Much better wording. I like that. OK, done that way. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > By the way the existing comment for the hook, >> * >> * We provide a function hook variable that lets loadable plugins get >> * control when ProcessUtility is called. Such a plugin would normally >> * call standard_ProcessUtility(). >> */ > This is quite a matter of course for developers. planner_hook and > other ones don't have such a comment or have a one-liner at > most. Since this hook gets a good amount of commnet, it seems > better to just remove the existing one.. Hm? I see pretty much the same wording for planner_hook: * To support loadable plugins that monitor or modify planner behavior,* we provide a hook variable that lets a plugin getcontrol before and* after the standard planning process. The plugin would normally call* standard_planner(). I think other places have copied-and-pasted this as well. The real problem with this is it isn't a correct specification of the contract: in most cases, the right thing to do is more like "call the previous occupant of the ProcessUtility_hook, or standard_ProcessUtility if there was none." Not sure if it's worth trying to be more precise in these comments, but if we do something about it, we need to hit all the places with this wording. regards, tom lane
Re: [HACKERS] BUG: pg_stat_statements query normalization issues withcombined queries
From
Craig Ringer
Date:
On 28 Jan. 2017 02:08, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.Hm? I see pretty much the same wording for planner_hook:jp> writes:
> By the way the existing comment for the hook,
>> *
>> * We provide a function hook variable that lets loadable plugins get
>> * control when ProcessUtility is called. Such a plugin would normally
>> * call standard_ProcessUtility().
>> */
> This is quite a matter of course for developers. planner_hook and
> other ones don't have such a comment or have a one-liner at
> most. Since this hook gets a good amount of commnet, it seems
> better to just remove the existing one..
* To support loadable plugins that monitor or modify planner behavior,
* we provide a hook variable that lets a plugin get control before and
* after the standard planning process. The plugin would normally call
* standard_planner().
I think other places have copied-and-pasted this as well.
The real problem with this is it isn't a correct specification of the
contract: in most cases, the right thing to do is more like "call the
previous occupant of the ProcessUtility_hook, or standard_ProcessUtility
if there was none."
Not sure if it's worth trying to be more precise in these comments,
but if we do something about it, we need to hit all the places with
this wording.
I'd rather leave it for the hooks documentation work that's being done elsewhere and have a README.hooks or something that discusses patterns common across hooks. Then we can just reference it.
Otherwise we'll have a lot of repetitive comments. Especially once we mention that you can also ERROR out or (potentially) take no action at all.
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
From
Kyotaro HORIGUCHI
Date:
Mmm.. At Sat, 28 Jan 2017 11:52:20 +0800, Craig Ringer <craig.ringer@2ndquadrant.com> wrote in <CAMsr+YGX0uU5Rw0fOKFwxtg_5WxWOBQ=KiTHWiKrb65H67inwg@mail.gmail.com> > On 28 Jan. 2017 02:08, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > By the way the existing comment for the hook, > > >> * > >> * We provide a function hook variable that lets loadable plugins get > >> * control when ProcessUtility is called. Such a plugin would normally > >> * call standard_ProcessUtility(). > >> */ > > > This is quite a matter of course for developers. planner_hook and > > other ones don't have such a comment or have a one-liner at > > most. Since this hook gets a good amount of commnet, it seems > > better to just remove the existing one.. > > Hm? I see pretty much the same wording for planner_hook: Sorry, my eyes were very short-ranged. Surely most of them have commnents with very similar phrase. ExplainOneQuery seems to be one exception but I'll call ExplainOneQuery in the hook function, though:p Anyway I don't want to remove the comment in ProcessUtility since now I know that is rather the common case. > * To support loadable plugins that monitor or modify planner behavior, > * we provide a hook variable that lets a plugin get control before and > * after the standard planning process. The plugin would normally call > * standard_planner(). > > I think other places have copied-and-pasted this as well. > > The real problem with this is it isn't a correct specification of the > contract: in most cases, the right thing to do is more like "call the > previous occupant of the ProcessUtility_hook, or standard_ProcessUtility > if there was none." > > Not sure if it's worth trying to be more precise in these comments, > but if we do something about it, we need to hit all the places with > this wording. That seems bad. I'll prefer rather leaving them alone. Modifying them wouldn't be so much gain if many of them already have such comment. > I'd rather leave it for the hooks documentation work that's being done > elsewhere and have a README.hooks or something that discusses patterns > common across hooks. Then we can just reference it. > > Otherwise we'll have a lot of repetitive comments. Especially once we > mention that you can also ERROR out or (potentially) take no action at all. Sorry for the noise.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center