Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers
| From | Yugo Nagata |
|---|---|
| Subject | Re: Suggestion to add --continue-client-on-abort option to pgbench |
| Date | |
| Msg-id | 20250930102353.e368966be54d568e87e3ea95@sraoss.co.jp Whole thread Raw |
| In response to | Re: Suggestion to add --continue-client-on-abort option to pgbench (Yugo Nagata <nagata@sraoss.co.jp>) |
| Responses |
Re: Suggestion to add --continue-client-on-abort option to pgbench
|
| List | pgsql-hackers |
On Fri, 26 Sep 2025 11:44:42 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
> On Thu, 25 Sep 2025 17:17:36 +0800
> Chao Li <li.evan.chao@gmail.com> wrote:
>
> > Hi Yugo,
> >
> > Thanks for the patch. After reviewing it, I got a few small comments:
>
> Thank you for your reviewing and comments.
>
> > > On Sep 25, 2025, at 15:22, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > >
> > > --
> > > Yugo Nagata <nagata@sraoss.co.jp <mailto:nagata@sraoss.co.jp>>
> > >
<v13-0003-Improve-error-messages-for-errors-that-cause-cli.patch><v13-0002-Add-continue-on-error-option.patch><v13-0001-Fix-assertion-failure-and-verbose-messages-in-pi.patch>
> >
> >
> > 1 - 0001
> > ```
> > @@ -3265,6 +3271,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
> > PGresult *res;
> > PGresult *next_res;
> > int qrynum = 0;
> > + char *errmsg;
> > ```
> >
> > I think we should initialize errmsg to NULL. Compiler won’t auto initialize a local variable. If it happens to not
enterthe while loop, then errmsg will hold a random value, then pg_free(errmsg) will have trouble.
>
> I think this initialization is unnecessary, just like for res and next_res.
> If the code happens not to enter the while loop, pg_free(errmsg) will not be
> called anyway, since the error: label is only reachable from inside the loop.
>
> > 2 - 0002
> > ```
> > + <para>
> > + Allows clients to continue their run even if an SQL statement fails due to
> > + errors other than serialization or deadlock. Unlike serialization and deadlock
> > + failures, clients do not retry the same transactions but start new transaction.
> > + This option is useful when your custom script may raise errors due to some
> > + reason like unique constraints violation. Without this option, the client is
> > + aborted after such errors.
> > + </para>
> > ```
> >
> > A few nit suggestions:
> >
> > * “continue their run” => “continue running”
>
> Fixed.
>
> > * “clients to not retry the same transactions but start new transaction” => “clients do not retry the same
transactionbut start a new transaction instead"
>
> I see your point. Maybe we could follow Anthonin Bonnefoy's suggestion
> to use "proceed to the next transaction", as it may sound a bit more natural.
>
> > * “due to some reason like” => “for reasons such as"
>
> Fixed.
>
> > 3 - 0002
> > ```
> > + * Without --continue-on-error:
> > * failed (the number of failed transactions) =
> > ```
> >
> > Maybe add an empty line after “without” line.
>
> Makes sense. Fixed.
>
> > 4 - 0002
> > ```
> > + * When --continue-on-error is specified:
> > + *
> > + * failed (number of failed transactions) =
> > ```
> >
> > Maybe change to “With ---continue-on-error”, which sounds consistent with the previous “without”.
>
> Fixed.
>
> > 5 - 0002
> > ```
> > + int64 other_sql_failures; /* number of failed transactions for
> > + * reasons other than
> > + * serialization/deadlock failure, which
> > + * is counted if --continue-on-error is
> > + * specified */
> > ```
> >
> > How about rename this variable to “sql_errors”, which reflects to the new option name.
>
> I think it’s better to keep the current name, since the variable counts failed transactions,
> even though that happens to be equivalent to the number of SQL errors. It’s also consistent
> with the other variables, serialization_failures and deadlock_failures.
>
> > 6 - 0002
> > ```
> > @@ -4571,6 +4594,8 @@ getResultString(bool skipped, EStatus estatus)
> > return "serialization";
> > case ESTATUS_DEADLOCK_ERROR:
> > return "deadlock";
> > + case ESTATUS_OTHER_SQL_ERROR:
> > + return "other”;
> > ```
> >
> > I think this can just return “error”. I checked where this function is called, there will not be other words such
as“error” appended.
>
> getResultString() is called to get a string that represents the type of error
> causing the transaction failure, so simply returning "error" doesn’t seem very
> useful.
>
> > 7 - 0002
> > ```
> > /* it can be non-zero only if max_tries is not equal to one */
> > @@ -6569,6 +6602,10 @@ printResults(StatsData *total,
> > sstats->deadlock_failures,
> > (100.0 * sstats->deadlock_failures /
> > script_total_cnt));
> > + printf(" - number of other failures: " INT64_FORMAT " (%.3f%%)\n",
> > + sstats->other_sql_failures,
> > + (100.0 * sstats->other_sql_failures /
> > + script_total_cnt));
> > ```
> >
> > Do we only want to print this number when “―continue-on-error” is given?
>
> We could do that, but this message is printed only when
> --failures-detailed is specified. So I think users would not mind
> if it shows that the number of other failures is zero, even when
> --continue-on-error is not specified.
>
> I would appreciate hearing other people's opinions on this.
>
>
> I've attached updated patches that include fixes for some of your
> suggestions and for Anthonin Bonnefoy's suggestion on the documentation.
>
> I also split the patch according to Fujii-san's suggestion.
Fujii-san, thank you for committing the patch that fixes the assertion failure.
I've attached the remaining patches so that cfbot stays green.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachment
pgsql-hackers by date: