Thread: psql's \watch is broken
If I do something like this:
explain (analyze) select * from pgbench_accounts \watch 1
It behaves as expected. But once I break out of the loop with ctrl-C, then if I execute the same thing again it executes the command once, but shows no output and doesn't loop. It seems like some flag is getting set with ctrl-C, but then never gets reset.
It was broken in this commit:
commit a4fd3aa719e8f97299dfcf1a8f79b3017e2b8d8b
Author: Michael Paquier <michael@paquier.xyz>
Date: Mon Dec 2 11:18:56 2019 +0900
Refactor query cancellation code into src/fe_utils/
Author: Michael Paquier <michael@paquier.xyz>
Date: Mon Dec 2 11:18:56 2019 +0900
Refactor query cancellation code into src/fe_utils/
I've not dug into code itself, I just bisected it.
Cheers,
Jeff
> explain (analyze) select * from pgbench_accounts \watch 1 > > It behaves as expected. But once I break out of the loop with ctrl-C, then > if I execute the same thing again it executes the command once, but shows > no output and doesn't loop. It seems like some flag is getting set with > ctrl-C, but then never gets reset. > > It was broken in this commit: > > commit a4fd3aa719e8f97299dfcf1a8f79b3017e2b8d8b > Author: Michael Paquier <michael@paquier.xyz> > Date: Mon Dec 2 11:18:56 2019 +0900 > > Refactor query cancellation code into src/fe_utils/ > > I've not dug into code itself, I just bisected it. Thanks for the report. I'll look into it. -- Fabien.
On Sat, Dec 14, 2019 at 12:09:51AM +0100, Fabien COELHO wrote: > >> explain (analyze) select * from pgbench_accounts \watch 1 >> >> It behaves as expected. But once I break out of the loop with ctrl-C, then >> if I execute the same thing again it executes the command once, but shows >> no output and doesn't loop. It seems like some flag is getting set with >> ctrl-C, but then never gets reset. >> >> >> I've not dug into code itself, I just bisected it. > > Thanks for the report. I'll look into it. Looked at it already. And yes, I can see the difference. This comes from the switch from cancel_pressed to CancelRequested in psql, especially PSQLexecWatch() in this case. And actually, now that I look at it, I think that we should simply get rid of cancel_pressed in psql completely and replace it with CancelRequested. This also removes the need of having cancel_pressed defined in print.c, which was not really wanted originally. Attached is a patch which addresses the issue for me, and cleans up the code while on it. Fabien, Jeff, can you confirm please? -- Michael
Attachment
On Fri, Dec 13, 2019 at 9:45 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Dec 14, 2019 at 12:09:51AM +0100, Fabien COELHO wrote:
>
>> explain (analyze) select * from pgbench_accounts \watch 1
>>
>> It behaves as expected. But once I break out of the loop with ctrl-C, then
>> if I execute the same thing again it executes the command once, but shows
>> no output and doesn't loop. It seems like some flag is getting set with
>> ctrl-C, but then never gets reset.
>>
>>
>> I've not dug into code itself, I just bisected it.
>
> Thanks for the report. I'll look into it.
Looked at it already. And yes, I can see the difference. This comes
from the switch from cancel_pressed to CancelRequested in psql,
especially PSQLexecWatch() in this case. And actually, now that I
look at it, I think that we should simply get rid of cancel_pressed in
psql completely and replace it with CancelRequested. This also
removes the need of having cancel_pressed defined in print.c, which
was not really wanted originally. Attached is a patch which addresses
the issue for me, and cleans up the code while on it. Fabien, Jeff,
can you confirm please?
--
Michael
This works for me.
Thanks,
Jeff
>>> I've not dug into code itself, I just bisected it. >> >> Thanks for the report. I'll look into it. > > Looked at it already. Ah, the magic of timezones! > And yes, I can see the difference. This comes from the switch from > cancel_pressed to CancelRequested in psql, especially PSQLexecWatch() in > this case. And actually, now that I look at it, I think that we should > simply get rid of cancel_pressed in psql completely and replace it with > CancelRequested. This also removes the need of having cancel_pressed > defined in print.c, which was not really wanted originally. Attached is > a patch which addresses the issue for me, and cleans up the code while > on it. Fabien, Jeff, can you confirm please? Yep. Patch applies cleanly, compiles, works for me as well. -- Fabien.
Michael Paquier <michael@paquier.xyz> writes: > Looked at it already. And yes, I can see the difference. This comes > from the switch from cancel_pressed to CancelRequested in psql, > especially PSQLexecWatch() in this case. And actually, now that I > look at it, I think that we should simply get rid of cancel_pressed in > psql completely and replace it with CancelRequested. This also > removes the need of having cancel_pressed defined in print.c, which > was not really wanted originally. Attached is a patch which addresses > the issue for me, and cleans up the code while on it. Fabien, Jeff, > can you confirm please? Given the rather small number of existing uses of CancelRequested, I wonder if it wouldn't be a better idea to rename it to cancel_pressed? Also, perhaps I am missing something, but I do not see anyplace in the current code base that ever *clears* CancelRequested. How much has this code been tested? Is it really sane to remove the setting of that flag from psql_cancel_callback, as this patch does? Is it sane that CancelRequested isn't declared volatile? regards, tom lane
Hello Tom, My 0.02 €: > Given the rather small number of existing uses of CancelRequested, > I wonder if it wouldn't be a better idea to rename it to cancel_pressed? I prefer the former because it is more functional (a cancellation has been requested, however the mean to do so) while "pressed" rather suggest a particular operation. > Also, perhaps I am missing something, but I do not see anyplace in the > current code base that ever *clears* CancelRequested. This was already like that in the initial version before the refactoring. ./src/bin/scripts/common.h:extern bool CancelRequested; ./src/bin/scripts/common.c:bool CancelRequested = false; ./src/bin/scripts/common.c: CancelRequested = true; ./src/bin/scripts/common.c: CancelRequested = true; ./src/bin/scripts/common.c: CancelRequested = true; ./src/bin/scripts/common.c: CancelRequested = true; ./src/bin/scripts/vacuumdb.c: if (CancelRequested) ./src/bin/scripts/vacuumdb.c: if (CancelRequested) ./src/bin/scripts/vacuumdb.c: if (i < 0 || CancelRequested) However "cancel_request" resets are in "psql/mainloop.c", and are changed to "CancelRequest = false" resets by Michaël patch, so all seems fine. > How much has this code been tested? Is it really sane to remove the > setting of that flag from psql_cancel_callback, as this patch does? ISTM that the callback is called from a function which sets CancelRequest? > Is it sane that CancelRequested isn't declared volatile? I agree that it would seem appropriate, but the initial version I refactored was not declaring CancelRequested as volatile, so I did not change that. However, "cancel_pressed" is volatile, so merging the two would indeed suggest to declare it as volatile. -- Fabien.
On Sun, Dec 15, 2019 at 10:35:54PM +0100, Fabien COELHO wrote: >> Also, perhaps I am missing something, but I do not see anyplace in the >> current code base that ever *clears* CancelRequested. For bin/scripts/, that's not really a problem, because all code paths triggering a cancellation, aka in vacuumdb and reindexdb, exit immediately. >> How much has this code been tested? Sorry about that, not enough visibly :( >> Is it really sane to remove the setting of that flag from >> psql_cancel_callback, as this patch does? > > ISTM that the callback is called from a function which sets CancelRequest? Hmm, that's not right. Note that there is a subtle difference between psql and bin/scripts/. In the case of the scripts, originally CancelRequested tracks if a cancellation request has been sent to the backend or not. Hence, if the client has not called SetCancelConn() to set up cancelConn, then CancelRequested is switched to true all the time. Now, if cancelConn is set, but a cancellation request has not correctly completed, then CancelRequested never set to true. In the case of psql, the original code sets cancel_pressed all the time, even if a cancellation request has been done and that it failed, and did not care if cancelConn was set or not. So, the intention of psql is to always track when a cancellation attempt is done, even if it has failed to issue it, while for all our other frontends we want to make sure that a cancellation attempt is done, and that the cancellation has succeeded before looping out and exit. >> Is it sane that CancelRequested isn't declared volatile? > > I agree that it would seem appropriate, but the initial version I refactored > was not declaring CancelRequested as volatile, so I did not change that. > However, "cancel_pressed" is volatile, so merging the two > would indeed suggest to declare it as volatile. Actually, it seems to me that both suggestions are not completely right either about that stuff since the flag has been introduced in bin/scripts/ in a1792320, no? The way to handle such variables safely in a signal handler it to mark them as volatile and sig_atomic_t. The same can be said about the older cancel_pressed as of 718bb2c in psql. So fixed all that while on it. As the concepts behind cancel_pressed and CancelRequested are different, we need to keep cancel_pressed and make psql use it. And the callback used for WIN32 also needs to set the flag. I also think that we should do a better effort in documenting CancelRequested properly in cancel.c. All that should be fixed as of the attached, tested on Linux and from a Windows console. From a point of view of consistency, this actually brings back the code of psql to the same state as it was before a4fd3aa, except that we still have the refactored pieces. Thoughts? -- Michael
Attachment
On Mon, Dec 16, 2019 at 11:40:07AM +0900, Michael Paquier wrote: > As the concepts behind cancel_pressed and CancelRequested are > different, we need to keep cancel_pressed and make psql use it. And > the callback used for WIN32 also needs to set the flag. I also think > that we should do a better effort in documenting CancelRequested > properly in cancel.c. All that should be fixed as of the attached, > tested on Linux and from a Windows console. From a point of view of > consistency, this actually brings back the code of psql to the same > state as it was before a4fd3aa, except that we still have the > refactored pieces. Merging both flags can actually prove to be tricky, as we have some code paths involving --single-step where psql visibly assumes that a cancellation pressed does not necessarily imply one that succeeds is there is a cancellation object around (ExecQueryTuples, tuple printing and \copy). So I have fixed the issue by making the code of psql consistent with what we had before a4fd3aa. I think that it should be actually possible to merge CancelRequested and cancel_pressed while keeping the user-visible changes acceptable, and this requires a very careful lookup. -- Michael