Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Date | |
Msg-id | 20250925162216.9d2f1ba3683f85183510d2c5@sraoss.co.jp Whole thread Raw |
In response to | Re: Suggestion to add --continue-client-on-abort option to pgbench (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Suggestion to add --continue-client-on-abort option to pgbench
Re: Suggestion to add --continue-client-on-abort option to pgbench Re: Suggestion to add --continue-client-on-abort option to pgbench |
List | pgsql-hackers |
On Thu, 25 Sep 2025 13:49:05 +0900 Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Sep 25, 2025 at 11:17 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > On Thu, 25 Sep 2025 11:09:40 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > On Thu, 25 Sep 2025 02:19:27 +0900 > > > Fujii Masao <masao.fujii@gmail.com> wrote: > > > > > > > On Tue, Sep 23, 2025 at 11:58 AM Rintaro Ikeda > > > > <ikedarintarof@oss.nttdata.com> wrote: > > > > > I think the issue is a new bug because we have transitioned to CSTATE_ABORT > > > > > immediately after queries failed, without executing discardUntilSync(). > > > > > > Agreed. > > > > > > > If so, the fix in discardUntilSync() should be part of the continue-on-error > > > > patch. The assertion failure fix should be a separate patch, since only > > > > that needs to be backpatched (the failure can also occur in back branches). > > > > > > +1 > > > > > > > > > > > > I've attached a patch that fixes the assertion error. The content of v1 patch by > > > > > Mr. Nagata is also included. I would appreciate it if you review my patch. > > > > > > > + switch (PQresultStatus(res)) > > > > + { > > > > + case PGRES_PIPELINE_SYNC: > > > > + received_sync = true; > > > > > > > > In the PGRES_PIPELINE_SYNC case, PQclear(res) isn't called but should be. > > > > > > > > + case PGRES_FATAL_ERROR: > > > > + PQclear(res); > > > > + goto done; > > > > > > > > I don't think we need goto here. How about this instead? > > > > > > > > ----------------------- > > > > @@ -3525,11 +3525,18 @@ discardUntilSync(CState *st) > > > > * results have been discarded. > > > > */ > > > > st->num_syncs = 0; > > > > - PQclear(res); > > > > break; > > > > } > > > > + /* > > > > + * Stop receiving further results if PGRES_FATAL_ERROR > > > > is returned > > > > + * (e.g., due to a connection failure) before > > > > PGRES_PIPELINE_SYNC, > > > > + * since PGRES_PIPELINE_SYNC will never arrive. > > > > + */ > > > > + else if (PQresultStatus(res) == PGRES_FATAL_ERROR) > > > > + break; > > > > PQclear(res); > > > > } > > > > + PQclear(res); > > > > > > > > /* exit pipeline */ > > > > if (PQexitPipelineMode(st->con) != 1) > > > > ----------------------- > > > > > > I think Fujii-san's version is better because Ikeda-san's version doesn't > > > consider the case where PGRES_PIPELINE_SYNC is followed by another one. > > > In that situation, the loop would terminate without getting NULL, which would > > > causes an error in PQexitPipelineMode(). > > > > > > However, I would like to suggest an alternative solution: checking the connection > > > status when readCommandResponse() returns false. This seems more straightforwad, > > > since the cause of the error can be investigated immediately after it is detected. > > > > > > @@ -4024,7 +4043,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) > > > if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON) > > > st->state = CSTATE_END_COMMAND; > > > } > > > - else if (canRetryError(st->estatus)) > > > + else if (PQstatus(st->con) == CONNECTION_BAD) > > > + st->state = CSTATE_ABORTED; > > > + else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) || > > > + canRetryError(st->estatus)) > > > st->state = CSTATE_ERROR; > > > else > > > st->state = CSTATE_ABORTED; > > > > > > What do you think? > > > > > > > > > Additionally, I noticed that in pipeline mode, the error message reported in > > > readCommandResponse() is lost, because it is reset when PQgetResult() returned > > > NULL to indicate the end of query processing. For example: > > > > > > pgbench: client 0 got an error in command 3 (SQL) of script 0; > > > pgbench: client 1 got an error in command 3 (SQL) of script 0; > > > > > > This can be fixed this by saving the previous error message and using it > > > for the report. After the fix: > > > > > > pgbench: client 0 got an error in command 3 (SQL) of script 0; FATAL: terminating connection due to administratorcommand > > > > > > I've attached update patches. > > > > > > 0001 fixes the assersion failure in commandError() and error message lost > > > in readCommandResponse(). > > > > > > 0002 was the previous 0001 that adds --continiue-on-error, including the > > > fix to handle connection termination errors. > > > > > > 0003 is for improving error messages for errors that cause client abortion. > > > > I think the patch 0001 should be back patched since the issues can occurs > > even for retries of serialization failure or deadlock detection in pipeline mode. > > Agreed. > > Regarding 0001: > > + /* > + Errors should only be detected during an SQL command or the \endpipeline > + meta command. Any other case triggers an assertion failure. > + */ > > * should be added before "Errors" and "meta". Oops. Fixed. > + errmsg = pg_strdup(PQerrorMessage(st->con)); > > It would be good to add a comment explaining why we do this. > > + pg_free(errmsg); > > Shouldn't pg_free() be called also in the error case, i.e., after > jumping to the error label? Yes, it should be. Fixed. I've attached updated patches. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
pgsql-hackers by date: