Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers

From Rintaro Ikeda
Subject Re: Suggestion to add --continue-client-on-abort option to pgbench
Date
Msg-id 2347f414-ca32-4ab1-9de1-fd719a0f7e0e@oss.nttdata.com
Whole thread Raw
In response to Re: Suggestion to add --continue-client-on-abort option to pgbench  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi,

On 2025/09/22 11:56, Fujii Masao wrote:
> On Sat, Sep 20, 2025 at 12:21 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>>> While testing, I found that running pgbench with --continue-on-error and
>>> pipeline mode triggers the following assertion failure. Could this be
>>> a bug in the patch?
>>>
>>> ---------------------------------------------------
>>> $ cat pipeline.pgbench
>>> \startpipeline
>>> DO $$
>>>   BEGIN
>>>     PERFORM pg_sleep(3);
>>>     PERFORM pg_terminate_backend(pg_backend_pid());
>>>   END $$;
>>> \endpipeline
>>>
>>> $ pgbench -n --debug --verbose-errors -f pipeline.pgbench -c 2 -t 4 -M
>>> extended --continue-on-error
>>> ...
>>> Assertion failed:
>>> (sql_script[st->use_file].commands[st->command]->type == 1), function
>>> commandError, file pgbench.c, line 3081.
>>> Abort trap: 6
>>> ---------------------------------------------------
>>>
>>> When I ran the same command without --continue-on-error,
>>> the assertion failure did not occur.
>>
>> I think this bug was introduced by commit 4a39f87acd6e, which enabled pgbench
>> to retry and added the --verbose-errors option, rather than by this patch itself.
>>
>> The assertion failure occurs in commandError(), which is called to report an error when
>> it can be retried (i.e., serializable failure or deadlock), or when --continue-on-error
>> is used after this patch.
>>
>>  Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
>>
>> This assumes the error is always detected during SQL command execution, but
>> that’s not correct, since in pipeline mode, the error can be detected when
>> a \endpipeline meta-command is executed.
>>
>>  $ cat deadlock.sql
>>  \startpipeline
>>  begin;
>>  lock b;
>>  lock a;
>>  end;
>>  \endpipeline
>>
>>  $ cat deadlock2.sql
>>  \startpipeline
>>  begin;
>>  lock a;
>>  lock b;
>>  end;
>>  \endpipeline
>>
>>  $ pgbench --verbose-errors -f deadlock.sql  -f deadlock2.sql -c 2 -T 3 -M extended
>>  pgbench (19devel)
>>  starting vacuum...end.
>>  pgbench: pgbench.c:3062: commandError: Assertion `sql_script[st->use_file].commands[st->command]->type == 1'
failed.
>>
>> Although one option would be to remove this assertion, if we prefer to keep it,
>> the attached patch fixes the issue.
>
> Thanks for the analysis and the patch!
>
> I think we should fix the issue rather than just removing the assertion.
> I'd like to apply your patch with the following source comment:
>
> ---------------------------
> Errors should only be detected during an SQL command or the \endpipeline
> meta command. Any other case triggers an assertion failure.
> --------------------------
>
>
> With your patch and the continue-on-error patches, running the same pgbench
> command I used to reproduce the assertion failure upthread causes pgbench
> to hang. From my analysis, it enters an infinite loop in discardUntilSync().
> That loop waits for PGRES_PIPELINE_SYNC, but since the connection has already
> been closed, it never arrives, leaving pgbench stuck.
>
> Could this also happen without the continue-on-error patch, or is it a new bug
> introduced by it? Either way, it seems pgbench needs to exit the loop when
> the result status is PGRES_FATAL_ERROR.
>


Thank you for the analysis and the patches.

I think the issue is a new bug because we have transitioned to CSTATE_ABORT
immediately after queries failed, without executing discardUntilSync().

I've attached a patch that fixes the assertion error. The content of v1 patch by
Mr. Nagata is also included. I would appreciate it if you review my patch.

Regards,
Rintaro Ikeda

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Next
From: Quan Zongliang
Date:
Subject: Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization