Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: Suggestion to add --continue-client-on-abort option to pgbench
Date
Msg-id 20250926114442.392de13cbd4e055e92a9e884@sraoss.co.jp
Whole thread Raw
In response to Re: Suggestion to add --continue-client-on-abort option to pgbench  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
On Thu, 25 Sep 2025 17:17:36 +0800
Chao Li <li.evan.chao@gmail.com> wrote:

> Hi Yugo,
> 
> Thanks for the patch. After reviewing it, I got a few small comments:

Thank you for your reviewing and comments.

> > On Sep 25, 2025, at 15:22, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > 
> > -- 
> > Yugo Nagata <nagata@sraoss.co.jp <mailto:nagata@sraoss.co.jp>>
> >
<v13-0003-Improve-error-messages-for-errors-that-cause-cli.patch><v13-0002-Add-continue-on-error-option.patch><v13-0001-Fix-assertion-failure-and-verbose-messages-in-pi.patch>
> 
> 
> 1 - 0001
> ```
> @@ -3265,6 +3271,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
>      PGresult   *res;
>      PGresult   *next_res;
>      int            qrynum = 0;
> +    char       *errmsg;
> ```
> 
> I think we should initialize errmsg to NULL. Compiler won’t auto initialize a local variable. If it happens to not
enterthe while loop, then errmsg will hold a random value, then pg_free(errmsg) will have trouble.
 

I think this initialization is unnecessary, just like for res and next_res.
If the code happens not to enter the while loop, pg_free(errmsg) will not be
called anyway, since the error: label is only reachable from inside the loop.

> 2 - 0002
> ```
> +       <para>
> +        Allows clients to continue their run even if an SQL statement fails due to
> +        errors other than serialization or deadlock. Unlike serialization and deadlock
> +        failures, clients do not retry the same transactions but start new transaction.
> +        This option is useful when your custom script may raise errors due to some
> +        reason like unique constraints violation. Without this option, the client is
> +        aborted after such errors.
> +       </para>
> ```
> 
> A few nit suggestions:
> 
> * “continue their run” => “continue running”

Fixed.

> * “clients to not retry the same transactions but start new transaction” => “clients do not retry the same
transactionbut start a new transaction instead"
 

I see your point. Maybe we could follow Anthonin Bonnefoy's suggestion
to use "proceed to the next transaction", as it may sound a bit more natural.

> * “due to some reason like” => “for reasons such as"

Fixed.

> 3 - 0002
> ```
> +     * Without --continue-on-error:
>       * failed (the number of failed transactions) =
> ```
> 
> Maybe add an empty line after “without” line.

Makes sense. Fixed.
 
> 4 - 0002
> ```
> +     * When --continue-on-error is specified:
> +     *
> +     * failed (number of failed transactions) =
> ```
> 
> Maybe change to “With ---continue-on-error”, which sounds consistent with the previous “without”.

Fixed.

> 5 - 0002
> ```
> +    int64        other_sql_failures; /* number of failed transactions for
> +                                     * reasons other than
> +                                     * serialization/deadlock failure, which
> +                                     * is counted if --continue-on-error is
> +                                     * specified */
> ```
> 
> How about rename this variable to “sql_errors”, which reflects to the new option name.

I think it’s better to keep the current name, since the variable counts failed transactions,
even though that happens to be equivalent to the number of SQL errors. It’s also consistent
with the other variables, serialization_failures and deadlock_failures.

> 6 - 0002
> ```
> @@ -4571,6 +4594,8 @@ getResultString(bool skipped, EStatus estatus)
>                  return "serialization";
>              case ESTATUS_DEADLOCK_ERROR:
>                  return "deadlock";
> +            case ESTATUS_OTHER_SQL_ERROR:
> +                return "other”;
> ```
> 
> I think this can just return “error”. I checked where this function is called, there will not be other words such as
“error”appended.
 

getResultString() is called to get a string that represents the type of error
causing the transaction failure, so simply returning "error" doesn’t seem very
useful.
 
> 7 - 0002
> ```
>      /* it can be non-zero only if max_tries is not equal to one */
> @@ -6569,6 +6602,10 @@ printResults(StatsData *total,
>                                 sstats->deadlock_failures,
>                                 (100.0 * sstats->deadlock_failures /
>                                  script_total_cnt));
> +                        printf(" - number of other failures: " INT64_FORMAT " (%.3f%%)\n",
> +                               sstats->other_sql_failures,
> +                               (100.0 * sstats->other_sql_failures /
> +                                script_total_cnt));
> ```
> 
> Do we only want to print this number when “―continue-on-error” is given?

We could do that, but this message is printed only when
--failures-detailed is specified. So I think users would not mind
if it shows that the number of other failures is zero, even when
--continue-on-error is not specified.

I would appreciate hearing other people's opinions on this.


I've attached updated patches that include fixes for some of your
suggestions and for Anthonin Bonnefoy's suggestion on the documentation.

I also split the patch according to Fujii-san's suggestion.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: plan shape work
Next
From: Michael Paquier
Date:
Subject: Re: Remove unused for_all_tables field from AlterPublicationStmt