Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
Date
Msg-id alpine.DEB.2.20.1612210841370.3892@lancre
Whole thread Raw
In response to Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
List pgsql-hackers
Hello Robert & Kyotaro,

>> I'm a little doubtful about your proposed fix.  However, I haven't 
>> studied the code, so I don't know what other approach might be better.

That is indeed the question!

The key point is that the parser parses several statements from a string, 
but currently there is no clue about where the queries was found in the 
string. Only a parser may know about what is being parsed so can generate 
the location information. Now the possible solutions I see are:
 - the string is split per query before parsing, but this requires at   least a lexer... it looks pretty uneffective to
haveanother lexing   phase involved, even if an existing lexer is reused.
 
 - the parser processes one query at a time and keep the "remaining   unparse string" in some state that can be queried
tocheck up to   where it proceeded, but then the result must be stored somewhere   and it would make sense that it
wouldbe in the statement anyway,   just the management of the location information would be outside   the parser. Also
thatwould add the cost of relaunching the parser,   not sure how bad or insignificant this is. This is (2) below.
 
 - the parser still generates a List<Node> but keep track of the location   of statements doing so, somehow... propably
asI outlined.
 

The query string information can be at some way of pointing in the initial
string, or the substring itself that could be extracted at some point.
I initially suggested the former because this is already what the parser 
does for some nodes, and because it seems simpler to do so.

Extracting the string instead would suggest that the location of tokens
within this statement are relative to this string rather than the initial 
one, but that involves a lot more changes and it is easier to miss 
something doing so.

> That will work and doable, but the more fundamental issue here
> seems to be that post_parse_analyze_hook or other similar hooks
> are called with a Query incosistent with query_debug_string.

Sure.

> It is very conveniently used but the name seems to me to be suggesting 
> that such usage is out of purpose. I'm not sure, though.
>
> A similar behavior is shown in error messages but this harms far
> less than the case of pg_stat_statements.
>
> ERROR:  column "b" does not exist
> LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...
>                                                             ^

Ah, I wrote this piece of code a long time ago:-) The location is relative 
to the full string, see comment above, changing that would involve much
more changes, and I'm not sure whether it is desirable.

Also, think of:

SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;

Maybe the context to know which "SELECT * FROM foo" generates an error 
should be kept.

> 1. Let all of the parse node have location in
>  debug_query_string. (The original proposal)

Yep.

> 2. Let the parser return once for each statement (like psql
>   parser)

I'm not sure it does... "\;" generates ";" in the output and the psql 
lexer keeps on lexing.

> and corresponding query string is stored in a
>   varialble other than debug_query_string.

I think that would involve many changes because of the way postgres is 
written, the list is expected and returned by quite a few functions. 
Moreover query rewriting may generate several queries out of one anyway, 
so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend the 
existing location information to statements, not only some tokens.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] asynchronous execution
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Creating a DSA area to provide work space forparallel execution