Thread: Funny representation in pg_stat_statements.query.
Hello, I noticed that pg_stat_statements.query can have funny values. | =# select pg_stat_statements_reset(); | =# select current_timestamp(0); | <snip> | =# select query from pg_stat_statements; | query | ------------------------------------ | select ?(0); | select pg_stat_statements_reset(); The same thing happenes for CURRENT_DATE, CURRENT_TIME, LOCALTIME and LOCALTIMESTAMP which are specially treated, that is, immediately replaced with a combination of a source function and a typecast in gram.y. The story is as follows, At first, for instance, CURRENT_TIMESTAMP(0) is replaced with 'now()::timestamptz(0)' and "CURRENT_TIMESTAMP"'s location is used as that of 'now' and the location of the const '1' of LOCALTIME is used as that of the const '1' for 'timestamptz'. Let's assume the orignal query was, | pos: Query | 0: SELECT | 7: CURRENT_TIMESTAMP( | 25: 0 | : ); This is parsed in gram.y as follows, the location for 'CURRENT_TIMESTAMP' above (7) is used as the location for the CONST "now". | TypeCast { | TypeCast { | A_Const("now") | TypeName {names = ["text"], loc = -1 (undef) } | loc = 7 | } | Typename {names = ["timestamptz"], loc = -1 } | typemods = [Const {1, loc = 25}] | } Then this is transformed into, | FuncExpr { | funcid = 'timestamptz' | args = [ | CoerceViaIO { | arg = Const {type = "text", value = "now", loc = 7 } | loc = -1 | } | Const { type = "int4", value = 1, loc = -1 } | ] | } Finally pg_stat_statements picks the location '7' as a location of some constant and replaces the token there with '?' nevertheless it is originally the location of 'CURRENT_TIMESTAMP' which is not a constant for users. I found two ways to fix this issue. Both of them prevents wrong masking but the original constant parameter ('0') becomes won't be masked. This behavior seems sane enough for the porpose. A. Making pg_stat_statements conscious of the token types to prevent wrong masking. 20140117_remove_needless_location_setting.patch B. Letting gram.y not set the location for the problematic nodes. 20140117_skip_nonconstants_on_nomalization.patch I don't have firm idea which is preferable. Or the possible another solution. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index f0b9507..30da7aa 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -11355,7 +11355,7 @@ func_expr_common_subexpr: * to rely on it.) */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); $$ = makeTypeCast(n,SystemTypeName("date"), -1); } | CURRENT_TIME @@ -11365,7 +11365,7 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); $$ = makeTypeCast(n,SystemTypeName("timetz"), -1); } | CURRENT_TIME '(' Iconst ')' @@ -11376,7 +11376,7 @@ func_expr_common_subexpr: */ Node *n; TypeName*d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("timetz"); d->typmods = list_make1(makeIntConst($3, @3)); $$ = makeTypeCast(n,d, -1); @@ -11397,7 +11397,7 @@ func_expr_common_subexpr: */ Node *n; TypeName*d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("timestamptz"); d->typmods = list_make1(makeIntConst($3, @3)); $$ = makeTypeCast(n,d, -1); @@ -11409,7 +11409,7 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); $$ = makeTypeCast((Node*)n, SystemTypeName("time"), -1); } | LOCALTIME '(' Iconst ')' @@ -11420,7 +11420,7 @@ func_expr_common_subexpr: */ Node *n; TypeName*d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("time"); d->typmods = list_make1(makeIntConst($3, @3)); $$ = makeTypeCast((Node*)n, d, -1); @@ -11432,7 +11432,7 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); $$ = makeTypeCast(n,SystemTypeName("timestamp"), -1); } | LOCALTIMESTAMP '(' Iconst ')' @@ -11443,7 +11443,7 @@ func_expr_common_subexpr: */ Node *n; TypeName*d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("timestamp"); d->typmods = list_make1(makeIntConst($3, @3)); $$ = makeTypeCast(n,d, -1); diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 2f069b7..9be9305 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -53,6 +53,7 @@#include "parser/analyze.h"#include "parser/parsetree.h"#include "parser/scanner.h" +#include "parser/gram.h"#include "pgstat.h"#include "storage/fd.h"#include "storage/ipc.h" @@ -2141,6 +2142,12 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query) break; /* out of inner for-loop */ } + /* Skip tokens other than constants */ + if (!(tok == FCONST || tok == SCONST || tok == BCONST || + tok == XCONST || tok == ICONST || tok == NULL_P || + tok == TRUE_P || tok == FALSE_P)) + break; /* out of inner for-loop */ + /* * We now rely on the assumption that flex has placed a zero * byte afterthe text of the current token in scanbuf.
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > Hello, I noticed that pg_stat_statements.query can have funny values. I don't think that's an acceptable reason for lobotomizing the parser's ability to print error cursors, which is what your first patch does (and without even any documentation that would keep someone from changing it back). The second patch might be all right, though I'm a bit disturbed by its dependency on gram.h constants. The numeric values of those change on a regular basis, and who's to say that these are exactly the set of tokens that we care about? In the end, really the cleanest fix for this would be to get rid of the grammar's translation of these special functions into hacky expressions. They ought to get translated into some new node type(s) that could be reverse-listed in standard form by ruleutils.c. I've wanted to do that for years, but never got around to it ... regards, tom lane
Thank you. tgl> > Hello, I noticed that pg_stat_statements.query can have funny values. tgl> tgl> I don't think that's an acceptable reason for lobotomizing the parser's tgl> ability to print error cursors, which is what your first patch does tgl> (and without even any documentation that would keep someone from tgl> changing it back). Yes, Nevertheless I don't think that replacement the keywords with the fuction 'now' cannot results in error and the function 'now' itself cannot throw errors before significant changes is made, of course there is no "never". tgl> The second patch might be all right, though I'm a bit disturbed by its tgl> dependency on gram.h constants. The numeric values of those change on a tgl> regular basis, and who's to say that these are exactly the set of tokens tgl> that we care about? I felt the same anxiety. The code seems largely unstable because mainly of the last reason. There seems no viable (and stable) means to check and/or keep the pertinence of the token set. So I also begged(?) for the third way. tgl> In the end, really the cleanest fix for this would be to get rid of the tgl> grammar's translation of these special functions into hacky expressions. tgl> They ought to get translated into some new node type(s) that could be tgl> reverse-listed in standard form by ruleutils.c. I've wanted to do that tgl> for years, but never got around to it ... Yes, that's what somewhat worried me. That way enables us to use the lexical locations without extra care. (although also seems to be a bit too heavy labor for the gain..) - CURRENT_TIME and the like are parsed into the nodes directly represents the node specs in gram.y - add transform.. uh.. transformDatetimeFuncs or something to transform the nodes into executable form, perhaps. But it should be after pg_stat_statements refer to the query tree. - modify ruleutils.c in corresponding part if needed. Since this becomes more than a simple bug fix, I think it is too late for 9.4 now. I'll work on this in the longer term. Thanks for the suggestion. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, I found it would be reasonably fixed by small modifications. > - CURRENT_TIME and the like are parsed into the nodes directly > represents the node specs in gram.y <blah, blah> > Since this becomes more than a simple bug fix, I think it is too > late for 9.4 now. I'll work on this in the longer term. The fundamental cause of this issue is Const node which conveys the location of other than constant tokens. Any other nodes, for instance TypeCasts, are safe. So this is fixed by quite simple way like following getting rid of the referred difficulties of keeping the code sane and total loss of token locations. (white spaces are collapsed for readability) | --- a/src/backend/parser/gram.y | +++ b/src/backend/parser/gram.y | @@ -11355,8 +11355,8 @@ func_expr_common_subexpr: | * to rely on it.) | */ | Node *n; | - n = makeStringConstCast("now", @1, SystemTypeName("text")); | - $$ = makeTypeCast(n, SystemTypeName("date"), -1); | + n = makeStringConstCast("now", -1, SystemTypeName("text")); | + $$ = makeTypeCast(n, SystemTypeName("date"), @1); Any suggestions? Attached is the complete patch. As of 9.4dev, transformTypeCast passes the locations retrieved from the source TypeCast node itself or its TypeName subnode to the newly created CoerceViaIO node but leaves that of the inner expressions (the "now", I mean) as it is, so I think this fits to what transformTypeCast does more than before. The query tree after the transformation sees like follows. I think this is preferable to that -1 to CoerceViaIO and 7 to Const. ... { type = TargetEtnry expr = { type = CoerceViaIO, location = 7, arg = { type = Const, location = -1 } } } In addition, the location stored in StringConstCast seems currently not having no route to users' sight, even any step out the node - in short 'useless'. And yet the location of CoerceViaIO is also not used... | | CURRENT_DATE .. |- n = makeStringConstCast("now", @1, SystemTypeName("text") |+ n = makeStringConstCast("nowe", @1, SystemTypeName("text") yields only this, including logs. | =# select CURRENT_DATE; | ERROR: invalid input syntax for type date: "nowe" regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 12a6beb..a14a381 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -11399,8 +11399,8 @@ func_expr_common_subexpr: * to rely on it.) */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); - $$ = makeTypeCast(n, SystemTypeName("date"), -1); + n = makeStringConstCast("now", -1, SystemTypeName("text")); + $$ = makeTypeCast(n, SystemTypeName("date"), @1); } | CURRENT_TIME { @@ -11409,8 +11409,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); - $$ = makeTypeCast(n, SystemTypeName("timetz"), -1); + n = makeStringConstCast("now", -1, SystemTypeName("text")); + $$ = makeTypeCast(n, SystemTypeName("timetz"), @1); } | CURRENT_TIME '(' Iconst')' { @@ -11420,10 +11420,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("timetz"); d->typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast(n, d, -1); + $$ = makeTypeCast(n, d, @1); } | CURRENT_TIMESTAMP { @@ -11441,10 +11441,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("timestamptz"); d->typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast(n, d, -1); + $$ = makeTypeCast(n, d, @1); } | LOCALTIME { @@ -11453,8 +11453,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); - $$ = makeTypeCast((Node *)n, SystemTypeName("time"), -1); + n = makeStringConstCast("now", -1, SystemTypeName("text")); + $$ = makeTypeCast((Node *)n, SystemTypeName("time"), @1); } | LOCALTIME '('Iconst ')' { @@ -11464,10 +11464,10 @@ func_expr_common_subexpr: */ Node *n; TypeName *d; - n = makeStringConstCast("now", @1, SystemTypeName("text")); + n = makeStringConstCast("now", -1, SystemTypeName("text")); d = SystemTypeName("time"); d->typmods = list_make1(makeIntConst($3, @3)); - $$ = makeTypeCast((Node *)n, d, -1); + $$ = makeTypeCast((Node *)n, d, @1); } | LOCALTIMESTAMP { @@ -11476,8 +11476,8 @@ func_expr_common_subexpr: * See comments for CURRENT_DATE. */ Node *n; - n = makeStringConstCast("now", @1, SystemTypeName("text")); - $$ = makeTypeCast(n, SystemTypeName("timestamp"), -1); + n = makeStringConstCast("now", -1, SystemTypeName("text")); + $$ = makeTypeCast(n, SystemTypeName("timestamp"), @1); } | LOCALTIMESTAMP'(' Iconst ')' {
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: >> - CURRENT_TIME and the like are parsed into the nodes directly >> represents the node specs in gram.y >> <blah, blah> >> Since this becomes more than a simple bug fix, I think it is too >> late for 9.4 now. I'll work on this in the longer term. OK. I will get around to it someday, but if you do it first, that's fine. > The fundamental cause of this issue is Const node which conveys > the location of other than constant tokens. Any other nodes, for > instance TypeCasts, are safe. > So this is fixed by quite simple way like following getting rid > of the referred difficulties of keeping the code sane and total > loss of token locations. (white spaces are collapsed for readability) Huh, that's a cute idea. It's certainly a kluge with a capital K, and might fail to extend to new cases; but if we're intending to replace all these cases with new special-purpose parse nodes soon, neither of those objections seem very strong. The main concern I'd have would be whether there could be any weird change in error cursor locations, but right offhand that's probably not a problem. We know in each of these cases that there will be some node produced by the coercion step, so the location won't disappear entirely, and so exprLocation() of the node tree as a whole should be the same. Not labeling the A_Const could result in failure to produce an error location for a complaint about invalid input for the coercion --- but surely that should never happen in any of these cases. So right offhand I think this is probably workable, and while it's ugly it's an appropriate amount of effort to put into code whose days are numbered anyhow. regards, tom lane
On Tue, Jan 21, 2014 at 12:30 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > The fundamental cause of this issue is Const node which conveys > the location of other than constant tokens. Any other nodes, for > instance TypeCasts, are safe. > > So this is fixed by quite simple way like following getting rid > of the referred difficulties of keeping the code sane and total > loss of token locations. (white spaces are collapsed for readability) There was a more obvious case of this that I noticed during the development of pg_stat_statements query normalization. As you may have noticed, that was addressed by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5d3fcc4c2e137417ef470d604fee5e452b22f6a7 -- Peter Geoghegan
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > The fundamental cause of this issue is Const node which conveys > the location of other than constant tokens. Any other nodes, for > instance TypeCasts, are safe. > So this is fixed by quite simple way like following getting rid > of the referred difficulties of keeping the code sane and total > loss of token locations. (white spaces are collapsed for readability) Committed with minor adjustments (improved comments, and hit one additional case you missed) regards, tom lane
Hi, considering on pg_stat_statements itself, pg> There was a more obvious case of this that I noticed during the pg> development of pg_stat_statements query normalization. As you may have pg> noticed, that was addressed by this commit: pg> pg> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5d3fcc4c2e137417ef470d604fee5e452b22f6a7 Not at all :-) mmm.. Can there be more? By the way, another kind of subtle behavior of pg_stat_statements I've seen but you don't seem to have noticed or appreciated is that about the function simplification in planner. On the current 9.4dev head on master, pg_stat_statements behaves as below, =# select pg_stat_statements_reset(); =# select log(1.5); =# select query from pg_stat_statements; query ------------------------------------select pg_catalog.log(?, $1)select log(?);select pg_stat_statements_reset(); This apparently seems that the query passing through pg_stat_statements twice during single execution. Actually, the first one is a by-product of planning (simplify_function) and the second is just what expected by users. It surely not a problem after knowing what is occurring but it should seem quite strange for those who don't know of that. Of couse there might be others than this simplification executing internally-generated query during planning, too, yet I haven't noticed. It's puzzling how to deal with it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, tgl> > So this is fixed by quite simple way like following getting rid tgl> > of the referred difficulties of keeping the code sane and total tgl> > loss of token locations. (white spaces are collapsed for readability) tgl> tgl> Committed with minor adjustments (improved comments, and hit one tgl> additional case you missed) mmm. Thank you for your helping as usual. Regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Jan 21, 2014 at 7:29 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > This apparently seems that the query passing through > pg_stat_statements twice during single execution. Actually, the > first one is a by-product of planning (simplify_function) and the > second is just what expected by users. It surely not a problem > after knowing what is occurring but it should seem quite strange > for those who don't know of that. Of couse there might be others > than this simplification executing internally-generated query > during planning, too, yet I haven't noticed. It's puzzling how to > deal with it. I was aware of the fact that those could get in there from planning. Another fun trick is that a pg_stat_statements_reset() containing query is perhaps recorded as a new entry for the parse analysis hook, and always as a new entry for the executor hook. Because, of course, it vanishes in the interim. So "select 1,2,3, pg_stat_statements_reset();" will never have any '?' characters for constant values. In practice everything mostly works out. I use pg_stat_statements quite a bit, and I only very occasionally see weirdness like complete failure to normalize a query string (it is a limitation of pg_stat_statements that this can happen with very low probability). It's a bit of a kludge, but the query string is always just strictly speaking a convenience to help the user understand what the entry is. Although, I suppose, most people don't realize that. -- Peter Geoghegan
Hello, > >> Since this becomes more than a simple bug fix, I think it is too > >> late for 9.4 now. I'll work on this in the longer term. > > OK. I will get around to it someday, but if you do it first, that's fine. Nevertheless of my words, the drive is getting diminished after the issue was settled:-p Anyway I'll keep this in my mind. > > The fundamental cause of this issue is Const node which conveys > > the location of other than constant tokens. Any other nodes, for > > instance TypeCasts, are safe. > > > So this is fixed by quite simple way like following getting rid > > of the referred difficulties of keeping the code sane and total > > loss of token locations. (white spaces are collapsed for readability) > > Huh, that's a cute idea. It's certainly a kluge with a capital K, > and might fail to extend to new cases; but if we're intending to > replace all these cases with new special-purpose parse nodes soon, > neither of those objections seem very strong. Yes, but reforming parser so as to use special-purpose nodes impacts largely so this measure was brought up for the offhand. > The main concern I'd have would be whether there could be any weird > change in error cursor locations, but right offhand that's probably > not a problem. We know in each of these cases that there will be > some node produced by the coercion step, so the location won't > disappear entirely, and so exprLocation() of the node tree as a > whole should be the same. Not labeling the A_Const could result in > failure to produce an error location for a complaint about invalid > input for the coercion --- but surely that should never happen in > any of these cases. Surely. > So right offhand I think this is probably workable, and while it's > ugly it's an appropriate amount of effort to put into code whose > days are numbered anyhow. Thank you. reagrds, -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI wrote: > Hello, > > > >> Since this becomes more than a simple bug fix, I think it is too > > >> late for 9.4 now. I'll work on this in the longer term. > > > > OK. I will get around to it someday, but if you do it first, that's fine. > > Nevertheless of my words, the drive is getting diminished after > the issue was settled:-p Anyway I'll keep this in my mind. This is a common theme with patch submitters: once you accept a good enough workaround, their motivation to write a real fix diminishes considerably, and as a result kluges live on. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services