Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries |
Date | |
Msg-id | alpine.DEB.2.20.1612210841370.3892@lancre Whole thread Raw |
In response to | Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
|
List | pgsql-hackers |
Hello Robert & Kyotaro, >> I'm a little doubtful about your proposed fix. However, I haven't >> studied the code, so I don't know what other approach might be better. That is indeed the question! The key point is that the parser parses several statements from a string, but currently there is no clue about where the queries was found in the string. Only a parser may know about what is being parsed so can generate the location information. Now the possible solutions I see are: - the string is split per query before parsing, but this requires at least a lexer... it looks pretty uneffective to haveanother lexing phase involved, even if an existing lexer is reused. - the parser processes one query at a time and keep the "remaining unparse string" in some state that can be queried tocheck up to where it proceeded, but then the result must be stored somewhere and it would make sense that it wouldbe in the statement anyway, just the management of the location information would be outside the parser. Also thatwould add the cost of relaunching the parser, not sure how bad or insignificant this is. This is (2) below. - the parser still generates a List<Node> but keep track of the location of statements doing so, somehow... propably asI outlined. The query string information can be at some way of pointing in the initial string, or the substring itself that could be extracted at some point. I initially suggested the former because this is already what the parser does for some nodes, and because it seems simpler to do so. Extracting the string instead would suggest that the location of tokens within this statement are relative to this string rather than the initial one, but that involves a lot more changes and it is easier to miss something doing so. > That will work and doable, but the more fundamental issue here > seems to be that post_parse_analyze_hook or other similar hooks > are called with a Query incosistent with query_debug_string. Sure. > It is very conveniently used but the name seems to me to be suggesting > that such usage is out of purpose. I'm not sure, though. > > A similar behavior is shown in error messages but this harms far > less than the case of pg_stat_statements. > > ERROR: column "b" does not exist > LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com... > ^ Ah, I wrote this piece of code a long time ago:-) The location is relative to the full string, see comment above, changing that would involve much more changes, and I'm not sure whether it is desirable. Also, think of: SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo; Maybe the context to know which "SELECT * FROM foo" generates an error should be kept. > 1. Let all of the parse node have location in > debug_query_string. (The original proposal) Yep. > 2. Let the parser return once for each statement (like psql > parser) I'm not sure it does... "\;" generates ";" in the output and the psql lexer keeps on lexing. > and corresponding query string is stored in a > varialble other than debug_query_string. I think that would involve many changes because of the way postgres is written, the list is expected and returned by quite a few functions. Moreover query rewriting may generate several queries out of one anyway, so the list would be kept. So I'm rather still in favor of my initial proposal, that is extend the existing location information to statements, not only some tokens. -- Fabien.
pgsql-hackers by date: