Thread: statement_timeout logging
Currently, when we set a statement_timeout and a query runs over that time there is no log message to say that the statement has timed out. We do get a message which says ERROR: canceling query due to user request and so in the server log it is impossible to tell the difference between a query that has been cancelled deliberately by the user and a query whose time(out) has come. The timeout causes the query to be cancelled, which is an ERROR, so it is possible that the query will be logged if log_min_error_statement is set to ERROR or below. Again, there is no way to tell the difference between an ERROR statement that has occurred during execution and an ERROR statement that is generated by a syntax error. So, if you do set log_min_error_statement=ERROR then you get swamped by syntax errors rather than being able to see statement timeouts. Logging these queries ought to be a valuable source of tuning information, but as a result of the above, this is not currently the case. For now, I've written a quick patch to the log the query text with a suitable error message. I expect some debate as to whether the line if (log_min_error_statement > WARNING) should be there at all, so comments are sought. --- Longer term, we might handle this differently. If we differentiate between two types of ERROR, at least for log_min_error_statement... - one that prevents the query from ever running (syntax, parse errors) - one that occurs because the query itself gets into difficulty, but not so badly that it constitutes a FATAL error. Lets call it RUNERROR for now. I expect most people to want to log RUNERRORs, but to want to ignore the more normal kind of ERRORs. When/if we can differentiate between levels, I'll reverse out this patch since we'll be able to handle it simply with the new errorlevel. Best Regards, Simon Riggs
Attachment
Simon Riggs wrote: > Currently, when we set a statement_timeout and a query runs over that > time there is no log message to say that the statement has timed out. We > do get a message which says > ERROR: canceling query due to user request > and so in the server log it is impossible to tell the difference between > a query that has been cancelled deliberately by the user and a query > whose time(out) has come. I have updated the message to: errmsg("canceling query due to user request or statement timeout"))); > The timeout causes the query to be cancelled, which is an ERROR, so it > is possible that the query will be logged if log_min_error_statement is > set to ERROR or below. Again, there is no way to tell the difference > between an ERROR statement that has occurred during execution and an > ERROR statement that is generated by a syntax error. So, if you do set > log_min_error_statement=ERROR then you get swamped by syntax errors > rather than being able to see statement timeouts. > > Logging these queries ought to be a valuable source of tuning > information, but as a result of the above, this is not currently the > case. > > For now, I've written a quick patch to the log the query text with a > suitable error message. > > I expect some debate as to whether the line > if (log_min_error_statement > WARNING) > should be there at all, so comments are sought. I don't see why printing the query cancel from a timeout is any more special than a user request for cancel or a simple query error. If users want statements to be printed on error, they will configure things that way, if not, we should not print them. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgman wrote: > Simon Riggs wrote: > > Currently, when we set a statement_timeout and a query runs over that > > time there is no log message to say that the statement has timed out. We > > do get a message which says > > ERROR: canceling query due to user request > > and so in the server log it is impossible to tell the difference between > > a query that has been cancelled deliberately by the user and a query > > whose time(out) has come. > > I have updated the message to: > > errmsg("canceling query due to user request or statement timeout"))); Oops. Did we freeze the message strings already for this release? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, Sep 16, 2005 at 03:41:11PM -0400, Bruce Momjian wrote: > pgman wrote: > > I have updated the message to: > > > > errmsg("canceling query due to user request or statement timeout"))); > > Oops. Did we freeze the message strings already for this release? Not yet. I'm not sure I agree with the wording though -- is there no way to tell the difference? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "No single strategy is always right (Unless the boss says so)" (Larry Wall)
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > I don't see why printing the query cancel from a timeout is any more > special than a user request for cancel or a simple query error. If > users want statements to be printed on error, they will > configure things > that way, if not, we should not print them. The log is for admins, not users. If an admin sets statement_timeout, then it is a good idea to have the SQL logged as a way of gaining performance information for the *admin*, who knows less about what users do, yet is still charged with the need to tune if one user is effecting other users/jobs. That's a different scenario than a user cancelling their query. The user knows they've cancelled it, so they already know the SQL and can ask the admin if they want it faster. Generally the admin doesn't care if a user cancelled a query, since it might be for a whole host of reasons, whereas a query cancelled for statement_timeout has one specific cause. So, I still want this functionality, so the "bad" query is in the logs. Best Regards, Simon Riggs
>Alvaro Herrera > On Fri, Sep 16, 2005 at 03:41:11PM -0400, Bruce Momjian wrote: > > > I have updated the message to: > > > > > > errmsg("canceling query due to user request or > statement timeout"))); > > > > Oops. Did we freeze the message strings already for this release? > > Not yet. > > I'm not sure I agree with the wording though -- is there no > way to tell > the difference? Yes, my patch. :-) The message is generated following receipt of a SIGINT which is sent by kill() after the statement_timeout. My patch was able to generate a different message by logging just before the kill() is called. (Obviously, there isn't much "after" in this circumstance). Best Regards, Simon Riggs
Simon Riggs wrote: > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > I don't see why printing the query cancel from a timeout is any more > > special than a user request for cancel or a simple query error. If > > users want statements to be printed on error, they will > > configure things > > that way, if not, we should not print them. > > The log is for admins, not users. > > If an admin sets statement_timeout, then it is a good idea to have the > SQL logged as a way of gaining performance information for the *admin*, > who knows less about what users do, yet is still charged with the need > to tune if one user is effecting other users/jobs. That's a different > scenario than a user cancelling their query. The user knows they've > cancelled it, so they already know the SQL and can ask the admin if they > want it faster. Generally the admin doesn't care if a user cancelled a > query, since it might be for a whole host of reasons, whereas a query > cancelled for statement_timeout has one specific cause. > > So, I still want this functionality, so the "bad" query is in the logs. Well, the user can use use statement_timeout too, so it isn't just the admin. Also, logging queries can be a security issue, so the idea of logging the query on duration timeout by default seems like a bad idea. We can go three ways. We can add a boolean GUC to control printing of the query during a timeout, but that seems like overkill. We can add a new level for log_min_error_statement that is just above error, but that seems confusing. I think the right solution would be to allow log_min_duration_statement to work for canceled queries. Right now, log_min_duration_statement doesn't work for canceled queries because the query never completes to give a final duration and hit the test code. Should that be fixed now or added to the TODO list? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs wrote: > >Alvaro Herrera > > On Fri, Sep 16, 2005 at 03:41:11PM -0400, Bruce Momjian wrote: > > > > I have updated the message to: > > > > > > > > errmsg("canceling query due to user request or > > statement timeout"))); > > > > > > Oops. Did we freeze the message strings already for this release? > > > > Not yet. > > > > I'm not sure I agree with the wording though -- is there no > > way to tell > > the difference? > > Yes, my patch. :-) > > The message is generated following receipt of a SIGINT which is sent by > kill() after the statement_timeout. > > My patch was able to generate a different message by logging just before > the kill() is called. (Obviously, there isn't much "after" in this > circumstance). Your patch just prints a statement "before" printing the cancel message. What we should do is to print the proper cause of the cancel, and I am working on a patch for that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 2005-09-16 at 20:48 -0400, Bruce Momjian wrote: > We can go three ways. We can add a boolean GUC to control printing of > the query during a timeout, but that seems like overkill. We can add a > new level for log_min_error_statement that is just above error, but that > seems confusing. I think the right solution would be to allow > log_min_duration_statement to work for canceled queries. Right now, > log_min_duration_statement doesn't work for canceled queries because the > query never completes to give a final duration and hit the test code. > Should that be fixed now or added to the TODO list? The last one seems the right way to go. So, reformat the message at statement_timeout, so that the log looks exactly like log_min_duration_statement: e.g. LOG: statement_timeout has been activated to cancel statement LOG: duration 1625652ms statement SELECT * from bigOne LOG: query has been cancelled by user action Perhaps we should change the message from kill() to be "statement" rather than "query" also... I'd vote fix now, but I guess that seems to be becoming a regular viewpoint from me. Best Regards, Simon Riggs
Simon Riggs wrote: > On Fri, 2005-09-16 at 20:48 -0400, Bruce Momjian wrote: > > We can go three ways. We can add a boolean GUC to control printing of > > the query during a timeout, but that seems like overkill. We can add a > > new level for log_min_error_statement that is just above error, but that > > seems confusing. I think the right solution would be to allow > > log_min_duration_statement to work for canceled queries. Right now, > > log_min_duration_statement doesn't work for canceled queries because the > > query never completes to give a final duration and hit the test code. > > Should that be fixed now or added to the TODO list? > > The last one seems the right way to go. OK. I tested it and it actually works, and I added documentation suggesting its usage. I was a little confused above because the STATEMENT: line is only output to the server logs because of the way elog.c handles a STATEMENT print for log_min_error_statement. It does not output to the client no matter what log_min_messages is set to, and if someone is concerned about that we can fix it. Technically STATEMENT is not a log level message. So, if I do: test=> SET statement_timeout = 1; SET test=> SET log_min_error_statement = 'ERROR'; SET test=> SELECT * FROM pg_class; ERROR: canceling statement due to statement timeout in the logs I see: ERROR: canceling statement due to statement timeout STATEMENT: select * from pg_class; > So, reformat the message at statement_timeout, so that the log looks > exactly like log_min_duration_statement: > > e.g. > > LOG: statement_timeout has been activated to cancel statement > LOG: duration 1625652ms statement SELECT * from bigOne > LOG: query has been cancelled by user action Not really. The problem here is that the last line is wrong --- it was not cancelled by user action. The attached, applied patch adds a "cancel_from_timeout" variable that properly records if the cancel was because of a timeout or user interaction, and displays the proper log message. > Perhaps we should change the message from kill() to be "statement" > rather than "query" also... kill() is the wrong place to print the message. You will see the attached patch does it in a cleaner way. > I'd vote fix now, but I guess that seems to be becoming a regular > viewpoint from me. OK, fixed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/config.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.17 diff -c -c -r1.17 config.sgml *** doc/src/sgml/config.sgml 13 Sep 2005 15:24:56 -0000 1.17 --- doc/src/sgml/config.sgml 19 Sep 2005 16:58:53 -0000 *************** *** 3232,3238 **** <listitem> <para> Abort any statement that takes over the specified number of ! milliseconds. A value of zero (the default) turns off the limitation. </para> </listitem> </varlistentry> --- 3232,3241 ---- <listitem> <para> Abort any statement that takes over the specified number of ! milliseconds. If <varname>log_min_error_statement</> is set to ! <literal>ERROR</> or lower, the statement that timed out will also be ! logged. A value of zero (the default) turns off the ! limitation. </para> </listitem> </varlistentry> Index: src/backend/storage/lmgr/proc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/lmgr/proc.c,v retrieving revision 1.163 diff -c -c -r1.163 proc.c *** src/backend/storage/lmgr/proc.c 20 Aug 2005 23:26:24 -0000 1.163 --- src/backend/storage/lmgr/proc.c 19 Sep 2005 16:58:55 -0000 *************** *** 78,83 **** --- 78,84 ---- /* Mark these volatile because they can be changed by signal handler */ static volatile bool statement_timeout_active = false; static volatile bool deadlock_timeout_active = false; + volatile bool cancel_from_timeout = false; /* statement_fin_time is valid only if statement_timeout_active is true */ static struct timeval statement_fin_time; *************** *** 1058,1063 **** --- 1059,1065 ---- Assert(!deadlock_timeout_active); statement_fin_time = fin_time; statement_timeout_active = true; + cancel_from_timeout = false; } else if (statement_timeout_active) { *************** *** 1128,1141 **** MemSet(&timeval, 0, sizeof(struct itimerval)); if (setitimer(ITIMER_REAL, &timeval, NULL)) { ! statement_timeout_active = deadlock_timeout_active = false; return false; } #else /* BeOS doesn't have setitimer, but has set_alarm */ if (set_alarm(B_INFINITE_TIMEOUT, B_PERIODIC_ALARM) < 0) { ! statement_timeout_active = deadlock_timeout_active = false; return false; } #endif --- 1130,1147 ---- MemSet(&timeval, 0, sizeof(struct itimerval)); if (setitimer(ITIMER_REAL, &timeval, NULL)) { ! statement_timeout_active = false; ! cancel_from_timeout = false; ! deadlock_timeout_active = false; return false; } #else /* BeOS doesn't have setitimer, but has set_alarm */ if (set_alarm(B_INFINITE_TIMEOUT, B_PERIODIC_ALARM) < 0) { ! statement_timeout_active = false; ! cancel_from_timeout = false; ! deadlock_timeout_active = false; return false; } #endif *************** *** 1146,1152 **** --- 1152,1161 ---- /* Cancel or reschedule statement timeout */ if (is_statement_timeout) + { statement_timeout_active = false; + cancel_from_timeout = false; + } else if (statement_timeout_active) { if (!CheckStatementTimeout()) *************** *** 1179,1184 **** --- 1188,1194 ---- { /* Time to die */ statement_timeout_active = false; + cancel_from_timeout = true; kill(MyProcPid, SIGINT); } else Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.459 diff -c -c -r1.459 postgres.c *** src/backend/tcop/postgres.c 16 Sep 2005 19:31:04 -0000 1.459 --- src/backend/tcop/postgres.c 19 Sep 2005 16:58:58 -0000 *************** *** 1979,1985 **** /* Set statement timeout running, if any */ if (StatementTimeout > 0) enable_sig_alarm(StatementTimeout, true); ! xact_started = true; } } --- 1979,1987 ---- /* Set statement timeout running, if any */ if (StatementTimeout > 0) enable_sig_alarm(StatementTimeout, true); ! else ! cancel_from_timeout = false; ! xact_started = true; } } *************** *** 2203,2211 **** ImmediateInterruptOK = false; /* not idle anymore */ DisableNotifyInterrupt(); DisableCatchupInterrupt(); ! ereport(ERROR, ! (errcode(ERRCODE_QUERY_CANCELED), ! errmsg("canceling query due to user request or statement timeout"))); } /* If we get here, do nothing (probably, QueryCancelPending was reset) */ } --- 2205,2218 ---- ImmediateInterruptOK = false; /* not idle anymore */ DisableNotifyInterrupt(); DisableCatchupInterrupt(); ! if (cancel_from_timeout) ! ereport(ERROR, ! (errcode(ERRCODE_QUERY_CANCELED), ! errmsg("canceling statement due to statement timeout"))); ! else ! ereport(ERROR, ! (errcode(ERRCODE_QUERY_CANCELED), ! errmsg("canceling statement due to user request"))); } /* If we get here, do nothing (probably, QueryCancelPending was reset) */ } Index: src/include/storage/proc.h =================================================================== RCS file: /cvsroot/pgsql/src/include/storage/proc.h,v retrieving revision 1.81 diff -c -c -r1.81 proc.h *** src/include/storage/proc.h 20 Aug 2005 23:26:34 -0000 1.81 --- src/include/storage/proc.h 19 Sep 2005 16:58:59 -0000 *************** *** 117,122 **** --- 117,124 ---- extern int DeadlockTimeout; extern int StatementTimeout; + extern volatile bool cancel_from_timeout; + /* * Function Prototypes Index: src/test/regress/expected/prepared_xacts.out =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/expected/prepared_xacts.out,v retrieving revision 1.2 diff -c -c -r1.2 prepared_xacts.out *** src/test/regress/expected/prepared_xacts.out 16 Sep 2005 19:39:02 -0000 1.2 --- src/test/regress/expected/prepared_xacts.out 19 Sep 2005 16:59:00 -0000 *************** *** 159,165 **** -- pxtest3 should be locked because of the pending DROP set statement_timeout to 1000; SELECT * FROM pxtest3; ! ERROR: canceling query due to user request or statement timeout reset statement_timeout; -- Disconnect, we will continue testing in a different backend \c - --- 159,165 ---- -- pxtest3 should be locked because of the pending DROP set statement_timeout to 1000; SELECT * FROM pxtest3; ! ERROR: canceling statement due to statement timeout reset statement_timeout; -- Disconnect, we will continue testing in a different backend \c - *************** *** 174,180 **** -- pxtest3 should still be locked because of the pending DROP set statement_timeout to 1000; SELECT * FROM pxtest3; ! ERROR: canceling query due to user request or statement timeout reset statement_timeout; -- Commit table creation COMMIT PREPARED 'regress-one'; --- 174,180 ---- -- pxtest3 should still be locked because of the pending DROP set statement_timeout to 1000; SELECT * FROM pxtest3; ! ERROR: canceling statement due to statement timeout reset statement_timeout; -- Commit table creation COMMIT PREPARED 'regress-one';
On Mon, 2005-09-19 at 13:21 -0400, Bruce Momjian wrote: > OK. I tested it and it actually works Gosh, that says a lot about my code quality. I will strive to improve from "actually works" to "works as expected" for future patches. > and I added documentation > suggesting its usage. > ! If <varname>log_min_error_statement</> is set to > ! <literal>ERROR</> or lower, the statement that timed out will also be > Might I suggest "Even if" rather than "if"? Best Regards, Simon Riggs
Simon Riggs wrote: > On Mon, 2005-09-19 at 13:21 -0400, Bruce Momjian wrote: > > > OK. I tested it and it actually works > > Gosh, that says a lot about my code quality. I will strive to improve > from "actually works" to "works as expected" for future patches. > > > and I added documentation > > suggesting its usage. > > > ! If <varname>log_min_error_statement</> is set to > > ! <literal>ERROR</> or lower, the statement that timed out will also be > > > > Might I suggest "Even if" rather than "if"? I don't see why "even if" makes sense here. There is no expectation that log_min_error_statement wouldn't work as documented. I am more suggesting that it is a nifity way to use the option. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073