Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Date | |
Msg-id | 20250925110940.ebacc31725758ec47d5432c6@sraoss.co.jp Whole thread Raw |
In response to | Re: Suggestion to add --continue-client-on-abort option to pgbench (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Suggestion to add --continue-client-on-abort option to pgbench
|
List | pgsql-hackers |
On Thu, 25 Sep 2025 02:19:27 +0900 Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Sep 23, 2025 at 11:58 AM Rintaro Ikeda > <ikedarintarof@oss.nttdata.com> wrote: > > I think the issue is a new bug because we have transitioned to CSTATE_ABORT > > immediately after queries failed, without executing discardUntilSync(). Agreed. > If so, the fix in discardUntilSync() should be part of the continue-on-error > patch. The assertion failure fix should be a separate patch, since only > that needs to be backpatched (the failure can also occur in back branches). +1 > > > I've attached a patch that fixes the assertion error. The content of v1 patch by > > Mr. Nagata is also included. I would appreciate it if you review my patch. > + switch (PQresultStatus(res)) > + { > + case PGRES_PIPELINE_SYNC: > + received_sync = true; > > In the PGRES_PIPELINE_SYNC case, PQclear(res) isn't called but should be. > > + case PGRES_FATAL_ERROR: > + PQclear(res); > + goto done; > > I don't think we need goto here. How about this instead? > > ----------------------- > @@ -3525,11 +3525,18 @@ discardUntilSync(CState *st) > * results have been discarded. > */ > st->num_syncs = 0; > - PQclear(res); > break; > } > + /* > + * Stop receiving further results if PGRES_FATAL_ERROR > is returned > + * (e.g., due to a connection failure) before > PGRES_PIPELINE_SYNC, > + * since PGRES_PIPELINE_SYNC will never arrive. > + */ > + else if (PQresultStatus(res) == PGRES_FATAL_ERROR) > + break; > PQclear(res); > } > + PQclear(res); > > /* exit pipeline */ > if (PQexitPipelineMode(st->con) != 1) > ----------------------- I think Fujii-san's version is better because Ikeda-san's version doesn't consider the case where PGRES_PIPELINE_SYNC is followed by another one. In that situation, the loop would terminate without getting NULL, which would causes an error in PQexitPipelineMode(). However, I would like to suggest an alternative solution: checking the connection status when readCommandResponse() returns false. This seems more straightforwad, since the cause of the error can be investigated immediately after it is detected. @@ -4024,7 +4043,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON) st->state = CSTATE_END_COMMAND; } - else if (canRetryError(st->estatus)) + else if (PQstatus(st->con) == CONNECTION_BAD) + st->state = CSTATE_ABORTED; + else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) || + canRetryError(st->estatus)) st->state = CSTATE_ERROR; else st->state = CSTATE_ABORTED; What do you think? Additionally, I noticed that in pipeline mode, the error message reported in readCommandResponse() is lost, because it is reset when PQgetResult() returned NULL to indicate the end of query processing. For example: pgbench: client 0 got an error in command 3 (SQL) of script 0; pgbench: client 1 got an error in command 3 (SQL) of script 0; This can be fixed this by saving the previous error message and using it for the report. After the fix: pgbench: client 0 got an error in command 3 (SQL) of script 0; FATAL: terminating connection due to administrator command I've attached update patches. 0001 fixes the assersion failure in commandError() and error message lost in readCommandResponse(). 0002 was the previous 0001 that adds --continiue-on-error, including the fix to handle connection termination errors. 0003 is for improving error messages for errors that cause client abortion. Regareds, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
pgsql-hackers by date: