Thread: Make pgbench exit on SIGINT more reliably

Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
Hello,

The way that pgbench handled SIGINT changed in
1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
couple of unintended consequences, at least from what I can tell[1].

- CTRL-C no longer stops the program unless the right point in pgbench
  execution is hit
- pgbench no longer exits with a non-zero exit code

An easy reproduction of these problems is to run with a large scale
factor like: pgbench -i -s 500000. Then try to CTRL-C the program.

The attached set of patches fixes this problem by allowing callers of
setup_cancel_handler() to attach a post-PQcancel callback. In this case,
we just call _exit(2). In addition, I noticed that psql had an EXIT_USER
constant, so I moved the common exit codes from src/bin/psql/settings.h
to src/include/fe_utils/exit_codes.h and made pgbench exit with
EXIT_USER.

[1]: https://www.postgresql.org/message-id/alpine.DEB.2.21.1910311939430.27369@lancre
--
Tristan Partin
Neon (https://neon.tech)

Attachment

Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
Here is a v2 that handles the Windows case that I seemingly missed in my
first readthrough of this code.

--
Tristan Partin
Neon (https://neon.tech)

Attachment

Re: Make pgbench exit on SIGINT more reliably

From
Michael Paquier
Date:
On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> The way that pgbench handled SIGINT changed in
> 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> couple of unintended consequences, at least from what I can tell[1].
>
> - CTRL-C no longer stops the program unless the right point in pgbench
>   execution is hit
> - pgbench no longer exits with a non-zero exit code
>
> An easy reproduction of these problems is to run with a large scale
> factor like: pgbench -i -s 500000. Then try to CTRL-C the program.

This comes from the code path where the data is generated client-side,
and where the current CancelRequested may not be that responsive,
isn't it?
--
Michael

Attachment

Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > The way that pgbench handled SIGINT changed in
> > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > couple of unintended consequences, at least from what I can tell[1].
> >
> > - CTRL-C no longer stops the program unless the right point in pgbench
> >   execution is hit
> > - pgbench no longer exits with a non-zero exit code
> >
> > An easy reproduction of these problems is to run with a large scale
> > factor like: pgbench -i -s 500000. Then try to CTRL-C the program.
>
> This comes from the code path where the data is generated client-side,
> and where the current CancelRequested may not be that responsive,
> isn't it?

It comes from the fact that CancelRequested is only checked within the
generation of the pgbench_accounts table, but with a large enough scale
factor or enough latency, say cross-continent communication, generating
the other tables can be time consuming too. But, yes you are more likely
to run into this issue when generating client-side data.

--
Tristan Partin
Neon (https://neon.tech)



Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > The way that pgbench handled SIGINT changed in
> > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > couple of unintended consequences, at least from what I can tell[1].
> >
> > - CTRL-C no longer stops the program unless the right point in pgbench
> >   execution is hit
> > - pgbench no longer exits with a non-zero exit code
> >
> > An easy reproduction of these problems is to run with a large scale
> > factor like: pgbench -i -s 500000. Then try to CTRL-C the program.
>
> This comes from the code path where the data is generated client-side,
> and where the current CancelRequested may not be that responsive,
> isn't it?

Yes, that is exactly it. There is only a single check for
CancelRequested in the client-side data generation at the moment.

--
Tristan Partin
Neon (https://neon.tech)



Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
Did not even remember sending an original reply. Disregard.

--
Tristan Partin
Neon (https://neon.tech)



Re: Make pgbench exit on SIGINT more reliably

From
Yugo NAGATA
Date:
On Wed, 24 May 2023 08:58:46 -0500
"Tristan Partin" <tristan@neon.tech> wrote:

> On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > > The way that pgbench handled SIGINT changed in
> > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > > couple of unintended consequences, at least from what I can tell[1].
> > > 
> > > - CTRL-C no longer stops the program unless the right point in pgbench
> > >   execution is hit 
> > > - pgbench no longer exits with a non-zero exit code
> > > 
> > > An easy reproduction of these problems is to run with a large scale
> > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program.
> >
> > This comes from the code path where the data is generated client-side,
> > and where the current CancelRequested may not be that responsive,
> > isn't it?
> 
> It comes from the fact that CancelRequested is only checked within the
> generation of the pgbench_accounts table, but with a large enough scale
> factor or enough latency, say cross-continent communication, generating
> the other tables can be time consuming too. But, yes you are more likely
> to run into this issue when generating client-side data.

If I understand correctly, your patch allows to exit pgbench immediately
during a client-side data generation even while the tables other than
pgbench_accounts are processed. It can be useful when the scale factor
is large. However, the same purpose seems to be achieved by you other patch [1].
Is the latter your latest proposal that replaces the patch in this thread?ad?

[1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po

Also, your proposal includes to change the exit code when pgbench is cancelled by
SIGINT. I think it is nice because this will make it easy to understand and clarify 
the result of the initialization. 

Currently, pgbench returns 0 when the initialization is cancelled while populating
pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only
one row, which is an inconsistent result. Returning the specific value for SIGINT
cancel can let user know it explicitly. In addition, it seems better if an error
message to inform would be output. 

For the above purpose, the patch moved exit codes of psql to fe_utils to be shared.
However,  I am not sure this is good way. Each frontend-tool may have it own exit code,
for example. "2" means "bad connection" in psql [2], but "errors during the run" in
pgbench [3]. So, I think it is better to define them separately.

[2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7
[3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7
 
Regards,
Yugo Nagata

> -- 
> Tristan Partin
> Neon (https://neon.tech)
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote:
> On Wed, 24 May 2023 08:58:46 -0500
> "Tristan Partin" <tristan@neon.tech> wrote:
>
> > On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> > > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > > > The way that pgbench handled SIGINT changed in
> > > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > > > couple of unintended consequences, at least from what I can tell[1].
> > > >
> > > > - CTRL-C no longer stops the program unless the right point in pgbench
> > > >   execution is hit
> > > > - pgbench no longer exits with a non-zero exit code
> > > >
> > > > An easy reproduction of these problems is to run with a large scale
> > > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program.
> > >
> > > This comes from the code path where the data is generated client-side,
> > > and where the current CancelRequested may not be that responsive,
> > > isn't it?
> >
> > It comes from the fact that CancelRequested is only checked within the
> > generation of the pgbench_accounts table, but with a large enough scale
> > factor or enough latency, say cross-continent communication, generating
> > the other tables can be time consuming too. But, yes you are more likely
> > to run into this issue when generating client-side data.
>
> If I understand correctly, your patch allows to exit pgbench immediately
> during a client-side data generation even while the tables other than
> pgbench_accounts are processed. It can be useful when the scale factor
> is large. However, the same purpose seems to be achieved by you other patch [1].
> Is the latter your latest proposal that replaces the patch in this thread?ad?
>
> [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po

The other patch does not replace this one. They are entirely separate.

> Also, your proposal includes to change the exit code when pgbench is cancelled by
> SIGINT. I think it is nice because this will make it easy to understand and clarify
> the result of the initialization.
>
> Currently, pgbench returns 0 when the initialization is cancelled while populating
> pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only
> one row, which is an inconsistent result. Returning the specific value for SIGINT
> cancel can let user know it explicitly. In addition, it seems better if an error
> message to inform would be output.
>
> For the above purpose, the patch moved exit codes of psql to fe_utils to be shared.
> However,  I am not sure this is good way. Each frontend-tool may have it own exit code,
> for example. "2" means "bad connection" in psql [2], but "errors during the run" in
> pgbench [3]. So, I think it is better to define them separately.
>
> [2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7
> [3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7

I can change it. I just figured that sharing exit code definitions
would've been preferrable. I will post a v3 some time soon which removes
that patch.

Thanks for your review :).

--
Tristan Partin
Neon (https://neon.tech)



Re: Make pgbench exit on SIGINT more reliably

From
Yugo NAGATA
Date:
On Mon, 19 Jun 2023 16:49:05 -0700
"Tristan Partin" <tristan@neon.tech> wrote:

> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote:
> > On Wed, 24 May 2023 08:58:46 -0500
> > "Tristan Partin" <tristan@neon.tech> wrote:
> >
> > > On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> > > > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > > > > The way that pgbench handled SIGINT changed in
> > > > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > > > > couple of unintended consequences, at least from what I can tell[1].
> > > > > 
> > > > > - CTRL-C no longer stops the program unless the right point in pgbench
> > > > >   execution is hit 
> > > > > - pgbench no longer exits with a non-zero exit code
> > > > > 
> > > > > An easy reproduction of these problems is to run with a large scale
> > > > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program.
> > > >
> > > > This comes from the code path where the data is generated client-side,
> > > > and where the current CancelRequested may not be that responsive,
> > > > isn't it?
> > > 
> > > It comes from the fact that CancelRequested is only checked within the
> > > generation of the pgbench_accounts table, but with a large enough scale
> > > factor or enough latency, say cross-continent communication, generating
> > > the other tables can be time consuming too. But, yes you are more likely
> > > to run into this issue when generating client-side data.
> >
> > If I understand correctly, your patch allows to exit pgbench immediately
> > during a client-side data generation even while the tables other than
> > pgbench_accounts are processed. It can be useful when the scale factor
> > is large. However, the same purpose seems to be achieved by you other patch [1].
> > Is the latter your latest proposal that replaces the patch in this thread?ad?
> >
> > [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po
> 
> The other patch does not replace this one. They are entirely separate.

After applying the other patch, CancelRequested would be checked enough frequently
even during initialization of pgbench_branches and pgbench_tellers, and it will
allow the initialization step to be cancelled immediately even if the scale factor
is high. So, is it unnecessary to modify setup_cancel_handler anyway?

I think it would be still nice to introduce a new exit code for query cancel separately.

> 
> > Also, your proposal includes to change the exit code when pgbench is cancelled by
> > SIGINT. I think it is nice because this will make it easy to understand and clarify 
> > the result of the initialization. 
> >
> > Currently, pgbench returns 0 when the initialization is cancelled while populating
> > pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only
> > one row, which is an inconsistent result. Returning the specific value for SIGINT
> > cancel can let user know it explicitly. In addition, it seems better if an error
> > message to inform would be output. 
> >
> > For the above purpose, the patch moved exit codes of psql to fe_utils to be shared.
> > However,  I am not sure this is good way. Each frontend-tool may have it own exit code,
> > for example. "2" means "bad connection" in psql [2], but "errors during the run" in
> > pgbench [3]. So, I think it is better to define them separately.
> >
> > [2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7
> > [3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7
> 
> I can change it. I just figured that sharing exit code definitions
> would've been preferrable. I will post a v3 some time soon which removes
> that patch.

Great!

> 
> Thanks for your review :).
> 
> -- 
> Tristan Partin
> Neon (https://neon.tech)


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Make pgbench exit on SIGINT more reliably

From
Michael Paquier
Date:
On Thu, Jun 22, 2023 at 02:58:14PM +0900, Yugo NAGATA wrote:
> On Mon, 19 Jun 2023 16:49:05 -0700
> "Tristan Partin" <tristan@neon.tech> wrote:
>> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote:
>>> [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po
>>
>> The other patch does not replace this one. They are entirely separate.
>
> After applying the other patch, CancelRequested would be checked enough frequently
> even during initialization of pgbench_branches and pgbench_tellers, and it will
> allow the initialization step to be cancelled immediately even if the scale factor
> is high. So, is it unnecessary to modify setup_cancel_handler anyway?
>
> I think it would be still nice to introduce a new exit code for query cancel separately.

I have the same impression as Nagata-san while going again through
the proposal of this thread.  COPY is more responsive to
interruptions, and is always going to have a much better performance
than individual or multi-value INSERTs for the bulk-loading of data,
so we could just move on with what's proposed on the other thread and
solve the problem of this thread on top of improving the loading
performance.

What are the benefits we gain with the proposal of this thread once we
switch to COPY as method for the client-side data generation?  If
the argument is to be able switch to a different error code on
cancellations for pgbench, that sounds a bit thin to me versus the
argument of keeping the cancellation callback path as simple as
possible.
--
Michael

Attachment

Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
On Thu Jun 22, 2023 at 6:19 PM CDT, Michael Paquier wrote:
> On Thu, Jun 22, 2023 at 02:58:14PM +0900, Yugo NAGATA wrote:
> > On Mon, 19 Jun 2023 16:49:05 -0700
> > "Tristan Partin" <tristan@neon.tech> wrote:
> >> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote:
> >>> [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po
> >>
> >> The other patch does not replace this one. They are entirely separate.
> >
> > After applying the other patch, CancelRequested would be checked enough frequently
> > even during initialization of pgbench_branches and pgbench_tellers, and it will
> > allow the initialization step to be cancelled immediately even if the scale factor
> > is high. So, is it unnecessary to modify setup_cancel_handler anyway?
> >
> > I think it would be still nice to introduce a new exit code for query cancel separately.
>
> I have the same impression as Nagata-san while going again through
> the proposal of this thread.  COPY is more responsive to
> interruptions, and is always going to have a much better performance
> than individual or multi-value INSERTs for the bulk-loading of data,
> so we could just move on with what's proposed on the other thread and
> solve the problem of this thread on top of improving the loading
> performance.
>
> What are the benefits we gain with the proposal of this thread once we
> switch to COPY as method for the client-side data generation?  If
> the argument is to be able switch to a different error code on
> cancellations for pgbench, that sounds a bit thin to me versus the
> argument of keeping the cancellation callback path as simple as
> possible.

I would say there probably isn't much benefit if you think the polling
for CancelRequested is fine. Looking at the other patch, I think it
might be simple to add an exit code for SIGINT. But I think it might be
best to do it after that patch is merged. I added the other patch to the
commitfest for July. Holding off on this one.

--
Tristan Partin
Neon (https://neon.tech)



Re: Make pgbench exit on SIGINT more reliably

From
Michael Paquier
Date:
On Tue, Jun 27, 2023 at 09:42:05AM -0500, Tristan Partin wrote:
> I would say there probably isn't much benefit if you think the polling
> for CancelRequested is fine. Looking at the other patch, I think it
> might be simple to add an exit code for SIGINT. But I think it might be
> best to do it after that patch is merged. I added the other patch to the
> commitfest for July. Holding off on this one.

Okay, I am going to jump on the patch to switch to COPY, then.
--
Michael

Attachment

Re: Make pgbench exit on SIGINT more reliably

From
"Tristan Partin"
Date:
On Mon Jul 10, 2023 at 10:29 PM CDT, Michael Paquier wrote:
> On Tue, Jun 27, 2023 at 09:42:05AM -0500, Tristan Partin wrote:
> > I would say there probably isn't much benefit if you think the polling
> > for CancelRequested is fine. Looking at the other patch, I think it
> > might be simple to add an exit code for SIGINT. But I think it might be
> > best to do it after that patch is merged. I added the other patch to the
> > commitfest for July. Holding off on this one.
>
> Okay, I am going to jump on the patch to switch to COPY, then.

After looking at the other patch some more, I don't think there is a
good way to reliably exit from SIGINT. The only time pgbench ever polls
for CancelRequested is in initPopulateTable(). So if you hit CTRL+C at
any time during the execution of the other initalization steps, you're
out of luck. The default initialization steps are dtgvp, so after g we
have vacuuming and primary keys. From what I can tell pgbench will
continue to run these steps even after it has received a SIGINT.

This behavior seems unintended and odd to me. Though, maybe I am missing
something.

--
Tristan Partin
Neon (https://neon.tech)