Thread: Make pgbench exit on SIGINT more reliably
Hello, The way that pgbench handled SIGINT changed in 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a couple of unintended consequences, at least from what I can tell[1]. - CTRL-C no longer stops the program unless the right point in pgbench execution is hit - pgbench no longer exits with a non-zero exit code An easy reproduction of these problems is to run with a large scale factor like: pgbench -i -s 500000. Then try to CTRL-C the program. The attached set of patches fixes this problem by allowing callers of setup_cancel_handler() to attach a post-PQcancel callback. In this case, we just call _exit(2). In addition, I noticed that psql had an EXIT_USER constant, so I moved the common exit codes from src/bin/psql/settings.h to src/include/fe_utils/exit_codes.h and made pgbench exit with EXIT_USER. [1]: https://www.postgresql.org/message-id/alpine.DEB.2.21.1910311939430.27369@lancre -- Tristan Partin Neon (https://neon.tech)
Attachment
Here is a v2 that handles the Windows case that I seemingly missed in my first readthrough of this code. -- Tristan Partin Neon (https://neon.tech)
Attachment
On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > The way that pgbench handled SIGINT changed in > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > couple of unintended consequences, at least from what I can tell[1]. > > - CTRL-C no longer stops the program unless the right point in pgbench > execution is hit > - pgbench no longer exits with a non-zero exit code > > An easy reproduction of these problems is to run with a large scale > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. This comes from the code path where the data is generated client-side, and where the current CancelRequested may not be that responsive, isn't it? -- Michael
Attachment
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > The way that pgbench handled SIGINT changed in > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > couple of unintended consequences, at least from what I can tell[1]. > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > execution is hit > > - pgbench no longer exits with a non-zero exit code > > > > An easy reproduction of these problems is to run with a large scale > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. > > This comes from the code path where the data is generated client-side, > and where the current CancelRequested may not be that responsive, > isn't it? It comes from the fact that CancelRequested is only checked within the generation of the pgbench_accounts table, but with a large enough scale factor or enough latency, say cross-continent communication, generating the other tables can be time consuming too. But, yes you are more likely to run into this issue when generating client-side data. -- Tristan Partin Neon (https://neon.tech)
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > The way that pgbench handled SIGINT changed in > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > couple of unintended consequences, at least from what I can tell[1]. > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > execution is hit > > - pgbench no longer exits with a non-zero exit code > > > > An easy reproduction of these problems is to run with a large scale > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. > > This comes from the code path where the data is generated client-side, > and where the current CancelRequested may not be that responsive, > isn't it? Yes, that is exactly it. There is only a single check for CancelRequested in the client-side data generation at the moment. -- Tristan Partin Neon (https://neon.tech)
Did not even remember sending an original reply. Disregard. -- Tristan Partin Neon (https://neon.tech)
On Wed, 24 May 2023 08:58:46 -0500 "Tristan Partin" <tristan@neon.tech> wrote: > On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > > The way that pgbench handled SIGINT changed in > > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > > couple of unintended consequences, at least from what I can tell[1]. > > > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > > execution is hit > > > - pgbench no longer exits with a non-zero exit code > > > > > > An easy reproduction of these problems is to run with a large scale > > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. > > > > This comes from the code path where the data is generated client-side, > > and where the current CancelRequested may not be that responsive, > > isn't it? > > It comes from the fact that CancelRequested is only checked within the > generation of the pgbench_accounts table, but with a large enough scale > factor or enough latency, say cross-continent communication, generating > the other tables can be time consuming too. But, yes you are more likely > to run into this issue when generating client-side data. If I understand correctly, your patch allows to exit pgbench immediately during a client-side data generation even while the tables other than pgbench_accounts are processed. It can be useful when the scale factor is large. However, the same purpose seems to be achieved by you other patch [1]. Is the latter your latest proposal that replaces the patch in this thread?ad? [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po Also, your proposal includes to change the exit code when pgbench is cancelled by SIGINT. I think it is nice because this will make it easy to understand and clarify the result of the initialization. Currently, pgbench returns 0 when the initialization is cancelled while populating pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only one row, which is an inconsistent result. Returning the specific value for SIGINT cancel can let user know it explicitly. In addition, it seems better if an error message to inform would be output. For the above purpose, the patch moved exit codes of psql to fe_utils to be shared. However, I am not sure this is good way. Each frontend-tool may have it own exit code, for example. "2" means "bad connection" in psql [2], but "errors during the run" in pgbench [3]. So, I think it is better to define them separately. [2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7 [3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7 Regards, Yugo Nagata > -- > Tristan Partin > Neon (https://neon.tech) > > -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote: > On Wed, 24 May 2023 08:58:46 -0500 > "Tristan Partin" <tristan@neon.tech> wrote: > > > On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > > > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > > > The way that pgbench handled SIGINT changed in > > > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > > > couple of unintended consequences, at least from what I can tell[1]. > > > > > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > > > execution is hit > > > > - pgbench no longer exits with a non-zero exit code > > > > > > > > An easy reproduction of these problems is to run with a large scale > > > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. > > > > > > This comes from the code path where the data is generated client-side, > > > and where the current CancelRequested may not be that responsive, > > > isn't it? > > > > It comes from the fact that CancelRequested is only checked within the > > generation of the pgbench_accounts table, but with a large enough scale > > factor or enough latency, say cross-continent communication, generating > > the other tables can be time consuming too. But, yes you are more likely > > to run into this issue when generating client-side data. > > If I understand correctly, your patch allows to exit pgbench immediately > during a client-side data generation even while the tables other than > pgbench_accounts are processed. It can be useful when the scale factor > is large. However, the same purpose seems to be achieved by you other patch [1]. > Is the latter your latest proposal that replaces the patch in this thread?ad? > > [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po The other patch does not replace this one. They are entirely separate. > Also, your proposal includes to change the exit code when pgbench is cancelled by > SIGINT. I think it is nice because this will make it easy to understand and clarify > the result of the initialization. > > Currently, pgbench returns 0 when the initialization is cancelled while populating > pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only > one row, which is an inconsistent result. Returning the specific value for SIGINT > cancel can let user know it explicitly. In addition, it seems better if an error > message to inform would be output. > > For the above purpose, the patch moved exit codes of psql to fe_utils to be shared. > However, I am not sure this is good way. Each frontend-tool may have it own exit code, > for example. "2" means "bad connection" in psql [2], but "errors during the run" in > pgbench [3]. So, I think it is better to define them separately. > > [2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7 > [3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7 I can change it. I just figured that sharing exit code definitions would've been preferrable. I will post a v3 some time soon which removes that patch. Thanks for your review :). -- Tristan Partin Neon (https://neon.tech)
On Mon, 19 Jun 2023 16:49:05 -0700 "Tristan Partin" <tristan@neon.tech> wrote: > On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote: > > On Wed, 24 May 2023 08:58:46 -0500 > > "Tristan Partin" <tristan@neon.tech> wrote: > > > > > On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > > > > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > > > > The way that pgbench handled SIGINT changed in > > > > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > > > > couple of unintended consequences, at least from what I can tell[1]. > > > > > > > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > > > > execution is hit > > > > > - pgbench no longer exits with a non-zero exit code > > > > > > > > > > An easy reproduction of these problems is to run with a large scale > > > > > factor like: pgbench -i -s 500000. Then try to CTRL-C the program. > > > > > > > > This comes from the code path where the data is generated client-side, > > > > and where the current CancelRequested may not be that responsive, > > > > isn't it? > > > > > > It comes from the fact that CancelRequested is only checked within the > > > generation of the pgbench_accounts table, but with a large enough scale > > > factor or enough latency, say cross-continent communication, generating > > > the other tables can be time consuming too. But, yes you are more likely > > > to run into this issue when generating client-side data. > > > > If I understand correctly, your patch allows to exit pgbench immediately > > during a client-side data generation even while the tables other than > > pgbench_accounts are processed. It can be useful when the scale factor > > is large. However, the same purpose seems to be achieved by you other patch [1]. > > Is the latter your latest proposal that replaces the patch in this thread?ad? > > > > [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po > > The other patch does not replace this one. They are entirely separate. After applying the other patch, CancelRequested would be checked enough frequently even during initialization of pgbench_branches and pgbench_tellers, and it will allow the initialization step to be cancelled immediately even if the scale factor is high. So, is it unnecessary to modify setup_cancel_handler anyway? I think it would be still nice to introduce a new exit code for query cancel separately. > > > Also, your proposal includes to change the exit code when pgbench is cancelled by > > SIGINT. I think it is nice because this will make it easy to understand and clarify > > the result of the initialization. > > > > Currently, pgbench returns 0 when the initialization is cancelled while populating > > pgbench_branches or pgbench_tellers, but the resultant pgbench_accounts has only > > one row, which is an inconsistent result. Returning the specific value for SIGINT > > cancel can let user know it explicitly. In addition, it seems better if an error > > message to inform would be output. > > > > For the above purpose, the patch moved exit codes of psql to fe_utils to be shared. > > However, I am not sure this is good way. Each frontend-tool may have it own exit code, > > for example. "2" means "bad connection" in psql [2], but "errors during the run" in > > pgbench [3]. So, I think it is better to define them separately. > > > > [2] https://www.postgresql.org/docs/current/app-psql.html#id-1.9.4.20.7 > > [3] https://www.postgresql.org/docs/current/pgbench.html#id=id-1.9.4.11.7 > > I can change it. I just figured that sharing exit code definitions > would've been preferrable. I will post a v3 some time soon which removes > that patch. Great! > > Thanks for your review :). > > -- > Tristan Partin > Neon (https://neon.tech) -- Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, Jun 22, 2023 at 02:58:14PM +0900, Yugo NAGATA wrote: > On Mon, 19 Jun 2023 16:49:05 -0700 > "Tristan Partin" <tristan@neon.tech> wrote: >> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote: >>> [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po >> >> The other patch does not replace this one. They are entirely separate. > > After applying the other patch, CancelRequested would be checked enough frequently > even during initialization of pgbench_branches and pgbench_tellers, and it will > allow the initialization step to be cancelled immediately even if the scale factor > is high. So, is it unnecessary to modify setup_cancel_handler anyway? > > I think it would be still nice to introduce a new exit code for query cancel separately. I have the same impression as Nagata-san while going again through the proposal of this thread. COPY is more responsive to interruptions, and is always going to have a much better performance than individual or multi-value INSERTs for the bulk-loading of data, so we could just move on with what's proposed on the other thread and solve the problem of this thread on top of improving the loading performance. What are the benefits we gain with the proposal of this thread once we switch to COPY as method for the client-side data generation? If the argument is to be able switch to a different error code on cancellations for pgbench, that sounds a bit thin to me versus the argument of keeping the cancellation callback path as simple as possible. -- Michael
Attachment
On Thu Jun 22, 2023 at 6:19 PM CDT, Michael Paquier wrote: > On Thu, Jun 22, 2023 at 02:58:14PM +0900, Yugo NAGATA wrote: > > On Mon, 19 Jun 2023 16:49:05 -0700 > > "Tristan Partin" <tristan@neon.tech> wrote: > >> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote: > >>> [1] https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po > >> > >> The other patch does not replace this one. They are entirely separate. > > > > After applying the other patch, CancelRequested would be checked enough frequently > > even during initialization of pgbench_branches and pgbench_tellers, and it will > > allow the initialization step to be cancelled immediately even if the scale factor > > is high. So, is it unnecessary to modify setup_cancel_handler anyway? > > > > I think it would be still nice to introduce a new exit code for query cancel separately. > > I have the same impression as Nagata-san while going again through > the proposal of this thread. COPY is more responsive to > interruptions, and is always going to have a much better performance > than individual or multi-value INSERTs for the bulk-loading of data, > so we could just move on with what's proposed on the other thread and > solve the problem of this thread on top of improving the loading > performance. > > What are the benefits we gain with the proposal of this thread once we > switch to COPY as method for the client-side data generation? If > the argument is to be able switch to a different error code on > cancellations for pgbench, that sounds a bit thin to me versus the > argument of keeping the cancellation callback path as simple as > possible. I would say there probably isn't much benefit if you think the polling for CancelRequested is fine. Looking at the other patch, I think it might be simple to add an exit code for SIGINT. But I think it might be best to do it after that patch is merged. I added the other patch to the commitfest for July. Holding off on this one. -- Tristan Partin Neon (https://neon.tech)
On Tue, Jun 27, 2023 at 09:42:05AM -0500, Tristan Partin wrote: > I would say there probably isn't much benefit if you think the polling > for CancelRequested is fine. Looking at the other patch, I think it > might be simple to add an exit code for SIGINT. But I think it might be > best to do it after that patch is merged. I added the other patch to the > commitfest for July. Holding off on this one. Okay, I am going to jump on the patch to switch to COPY, then. -- Michael
Attachment
On Mon Jul 10, 2023 at 10:29 PM CDT, Michael Paquier wrote: > On Tue, Jun 27, 2023 at 09:42:05AM -0500, Tristan Partin wrote: > > I would say there probably isn't much benefit if you think the polling > > for CancelRequested is fine. Looking at the other patch, I think it > > might be simple to add an exit code for SIGINT. But I think it might be > > best to do it after that patch is merged. I added the other patch to the > > commitfest for July. Holding off on this one. > > Okay, I am going to jump on the patch to switch to COPY, then. After looking at the other patch some more, I don't think there is a good way to reliably exit from SIGINT. The only time pgbench ever polls for CancelRequested is in initPopulateTable(). So if you hit CTRL+C at any time during the execution of the other initalization steps, you're out of luck. The default initialization steps are dtgvp, so after g we have vacuuming and primary keys. From what I can tell pgbench will continue to run these steps even after it has received a SIGINT. This behavior seems unintended and odd to me. Though, maybe I am missing something. -- Tristan Partin Neon (https://neon.tech)