Thread: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15492
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 11.0
Operating system:   Windows 2012 R2
Description:

When performing `make standbycheck` I get sporadic failure:

============== running regression test queries        ==============
test hs_standby_check         ... ok
test hs_standby_allowed       ... ok
test hs_standby_disallowed    ... ok
test hs_standby_functions     ... FAILED

======================
 1 of 4 tests failed. 
======================

***
C:/tmp/postgrespro-standard-10.6.1/src/test/regress/expected/hs_standby_functions.out    Wed
Nov  7 01:14:03 2018
---
C:/tmp/postgrespro-standard-10.6.1/src/test/regress/results/hs_standby_functions.out    Wed
Nov  7 06:36:47 2018
***************
*** 37,40 ****
  
  -- suicide is painless
  select pg_cancel_backend(pg_backend_pid());
! ERROR:  canceling statement due to user request
--- 37,44 ----

  -- suicide is painless
  select pg_cancel_backend(pg_backend_pid());
!  pg_cancel_backend 
! -------------------
!  t
! (1 row)
! 

======================================================================

In fact, I see the same when I just do in psql (using EnterpriseDB's
PostgreSQL 11 for Windows):

postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
 pg_cancel_backend
-------------------
 t
(1 row)


postgres=# select pg_cancel_backend(pg_backend_pid());
 pg_cancel_backend
-------------------
 t
(1 row)


postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=#

I couldn't reproduce it on Linux, though.
So if it's an expected behaviour, shouldn't the hs_standby_functions check
be fixed?
(I don't understand what is the point of this pg_cancel_backend call.)


Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns true sporadically

From
Magnus Hagander
Date:
On Thu, Nov 8, 2018 at 5:31 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      15492
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 11.0
Operating system:   Windows 2012 R2
Description:       

When performing `make standbycheck` I get sporadic failure:

============== running regression test queries        ==============
test hs_standby_check         ... ok
test hs_standby_allowed       ... ok
test hs_standby_disallowed    ... ok
test hs_standby_functions     ... FAILED

======================
 1 of 4 tests failed.
======================

***
C:/tmp/postgrespro-standard-10.6.1/src/test/regress/expected/hs_standby_functions.out   Wed
Nov  7 01:14:03 2018
---
C:/tmp/postgrespro-standard-10.6.1/src/test/regress/results/hs_standby_functions.out    Wed
Nov  7 06:36:47 2018
***************
*** 37,40 ****

  -- suicide is painless
  select pg_cancel_backend(pg_backend_pid());
! ERROR:  canceling statement due to user request
--- 37,44 ----

  -- suicide is painless
  select pg_cancel_backend(pg_backend_pid());
!  pg_cancel_backend
! -------------------
!  t
! (1 row)
!

======================================================================

In fact, I see the same when I just do in psql (using EnterpriseDB's
PostgreSQL 11 for Windows):

postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=# select pg_cancel_backend(pg_backend_pid());
 pg_cancel_backend
-------------------
 t
(1 row)


postgres=# select pg_cancel_backend(pg_backend_pid());
 pg_cancel_backend
-------------------
 t
(1 row)


postgres=# select pg_cancel_backend(pg_backend_pid());
ERROR:  canceling statement due to user request
postgres=#

I couldn't reproduce it on Linux, though.
So if it's an expected behaviour, shouldn't the hs_standby_functions check
be fixed?
(I don't understand what is the point of this pg_cancel_backend call.)

This is clearly a timing thing.

The most common case is that the signal is sent and delivered while the pg_cancel_backend() command is still executed. This is probably "always" happening on Unix due to how signals work.

On Windows, what happens in the case where it returns is that the signal is delivered to the "signal thread" (the separate thread handling our signal emulation), but that thread is not scheduled to run until the pg_cancel_backend() function has actually returned. Thus it returns the value and is then canceled.

That said, I agree with the question -- what is the point of this? pg_cancel_backend(pg_backend_pid()) can surely only ever cancel the pg_cancel_backend call itself, so it seems pointless.

The *comment* talks about suicide, which indicates that maybe the original intention was to use pg_terminate_backend()?  But it has also been i nthere since 2009, so why is this problem only showing up now? 

--

Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

From
Michael Paquier
Date:
On Thu, Nov 08, 2018 at 11:31:41AM +0100, Magnus Hagander wrote:
> The *comment* talks about suicide, which indicates that maybe the original
> intention was to use pg_terminate_backend()?  But it has also been i nthere
> since 2009, so why is this problem only showing up now?

Perhaps because the buildfarm does not actually test them on Windows,
because those tests are not run in MSVC scripts even if vcregress.pl is
run manually, and because doing a [install]check-world from the top of
the tree does not trigger them either?  We could get those running
within a TAP test automatically I think.
--
Michael

Attachment

Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns true sporadically

From
Thomas Munro
Date:
On Thu, Nov 8, 2018 at 11:31 PM Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, Nov 8, 2018 at 5:31 AM PG Bug reporting form <noreply@postgresql.org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      15492
>> Logged by:          Alexander Lakhin
>> Email address:      exclusion@gmail.com
>> PostgreSQL version: 11.0
>> Operating system:   Windows 2012 R2
>> Description:
>>
>> When performing `make standbycheck` I get sporadic failure:
>>
>> ============== running regression test queries        ==============
>> test hs_standby_check         ... ok
>> test hs_standby_allowed       ... ok
>> test hs_standby_disallowed    ... ok
>> test hs_standby_functions     ... FAILED
>>
>> ======================
>>  1 of 4 tests failed.
>> ======================
>>
>> ***
>> C:/tmp/postgrespro-standard-10.6.1/src/test/regress/expected/hs_standby_functions.out   Wed
>> Nov  7 01:14:03 2018
>> ---
>> C:/tmp/postgrespro-standard-10.6.1/src/test/regress/results/hs_standby_functions.out    Wed
>> Nov  7 06:36:47 2018
>> ***************
>> *** 37,40 ****
>>
>>   -- suicide is painless
>>   select pg_cancel_backend(pg_backend_pid());
>> ! ERROR:  canceling statement due to user request
>> --- 37,44 ----
>>
>>   -- suicide is painless
>>   select pg_cancel_backend(pg_backend_pid());
>> !  pg_cancel_backend
>> ! -------------------
>> !  t
>> ! (1 row)
>> !
>>
>> ======================================================================
>>
>> In fact, I see the same when I just do in psql (using EnterpriseDB's
>> PostgreSQL 11 for Windows):
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>>  pg_cancel_backend
>> -------------------
>>  t
>> (1 row)
>>
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>>  pg_cancel_backend
>> -------------------
>>  t
>> (1 row)
>>
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=#
>>
>> I couldn't reproduce it on Linux, though.
>> So if it's an expected behaviour, shouldn't the hs_standby_functions check
>> be fixed?
>> (I don't understand what is the point of this pg_cancel_backend call.)
>
>
> This is clearly a timing thing.
>
> The most common case is that the signal is sent and delivered while the pg_cancel_backend() command is still
executed.This is probably "always" happening on Unix due to how signals work. 
>
> On Windows, what happens in the case where it returns is that the signal is delivered to the "signal thread" (the
separatethread handling our signal emulation), but that thread is not scheduled to run until the pg_cancel_backend()
functionhas actually returned. Thus it returns the value and is then canceled. 
>
> That said, I agree with the question -- what is the point of this? pg_cancel_backend(pg_backend_pid()) can surely
onlyever cancel the pg_cancel_backend call itself, so it seems pointless. 
>
> The *comment* talks about suicide, which indicates that maybe the original intention was to use
pg_terminate_backend()? But it has also been i nthere since 2009, so why is this problem only showing up now? 

We saw a variant of this problem on appveyor (a Windows build-bot)
when testing Daniel's patch to add an optional message (search for
"timing"), and it was fixed as part of that patch, for the new code in
that patch:

https://www.postgresql.org/message-id/flat/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se

Perhaps other pre-existing tests need similar treatment?

--
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns true sporadically

From
Magnus Hagander
Date:
On Fri, Nov 9, 2018 at 2:21 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Nov 08, 2018 at 11:31:41AM +0100, Magnus Hagander wrote:
> The *comment* talks about suicide, which indicates that maybe the original
> intention was to use pg_terminate_backend()?  But it has also been i nthere
> since 2009, so why is this problem only showing up now?

Perhaps because the buildfarm does not actually test them on Windows,
because those tests are not run in MSVC scripts even if vcregress.pl is
run manually, and because doing a [install]check-world from the top of
the tree does not trigger them either?  We could get those running
within a TAP test automatically I think.

Oh, i didn't realize that wasn't run by the buildfarm. Then yeah, it's pretty likely it was simply never run.

-- 

Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns true sporadically

From
Magnus Hagander
Date:
On Fri, Nov 9, 2018 at 2:25 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Thu, Nov 8, 2018 at 11:31 PM Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, Nov 8, 2018 at 5:31 AM PG Bug reporting form <noreply@postgresql.org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      15492
>> Logged by:          Alexander Lakhin
>> Email address:      exclusion@gmail.com
>> PostgreSQL version: 11.0
>> Operating system:   Windows 2012 R2
>> Description:
>>
>> When performing `make standbycheck` I get sporadic failure:
>>
>> ============== running regression test queries        ==============
>> test hs_standby_check         ... ok
>> test hs_standby_allowed       ... ok
>> test hs_standby_disallowed    ... ok
>> test hs_standby_functions     ... FAILED
>>
>> ======================
>>  1 of 4 tests failed.
>> ======================
>>
>> ***
>> C:/tmp/postgrespro-standard-10.6.1/src/test/regress/expected/hs_standby_functions.out   Wed
>> Nov  7 01:14:03 2018
>> ---
>> C:/tmp/postgrespro-standard-10.6.1/src/test/regress/results/hs_standby_functions.out    Wed
>> Nov  7 06:36:47 2018
>> ***************
>> *** 37,40 ****
>>
>>   -- suicide is painless
>>   select pg_cancel_backend(pg_backend_pid());
>> ! ERROR:  canceling statement due to user request
>> --- 37,44 ----
>>
>>   -- suicide is painless
>>   select pg_cancel_backend(pg_backend_pid());
>> !  pg_cancel_backend
>> ! -------------------
>> !  t
>> ! (1 row)
>> !
>>
>> ======================================================================
>>
>> In fact, I see the same when I just do in psql (using EnterpriseDB's
>> PostgreSQL 11 for Windows):
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>>  pg_cancel_backend
>> -------------------
>>  t
>> (1 row)
>>
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>>  pg_cancel_backend
>> -------------------
>>  t
>> (1 row)
>>
>>
>> postgres=# select pg_cancel_backend(pg_backend_pid());
>> ERROR:  canceling statement due to user request
>> postgres=#
>>
>> I couldn't reproduce it on Linux, though.
>> So if it's an expected behaviour, shouldn't the hs_standby_functions check
>> be fixed?
>> (I don't understand what is the point of this pg_cancel_backend call.)
>
>
> This is clearly a timing thing.
>
> The most common case is that the signal is sent and delivered while the pg_cancel_backend() command is still executed. This is probably "always" happening on Unix due to how signals work.
>
> On Windows, what happens in the case where it returns is that the signal is delivered to the "signal thread" (the separate thread handling our signal emulation), but that thread is not scheduled to run until the pg_cancel_backend() function has actually returned. Thus it returns the value and is then canceled.
>
> That said, I agree with the question -- what is the point of this? pg_cancel_backend(pg_backend_pid()) can surely only ever cancel the pg_cancel_backend call itself, so it seems pointless.
>
> The *comment* talks about suicide, which indicates that maybe the original intention was to use pg_terminate_backend()?  But it has also been i nthere since 2009, so why is this problem only showing up now?

We saw a variant of this problem on appveyor (a Windows build-bot)
when testing Daniel's patch to add an optional message (search for
"timing"), and it was fixed as part of that patch, for the new code in
that patch:

https://www.postgresql.org/message-id/flat/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se

Perhaps other pre-existing tests need similar treatment?

Ah yes, that seems to be the same thing, and yes that seem like a reasonalbe solution. So something like:
+select case
+       when pg_cancel_backend(pg_backend_pid())
+       then pg_sleep(60)
+end;
 
Alexander, can you check to see if making that change solves the issue on your machine?

--
Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Nov 9, 2018 at 2:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Thu, Nov 08, 2018 at 11:31:41AM +0100, Magnus Hagander wrote:
>>> ... why is this problem only showing up now?

> Oh, i didn't realize that wasn't run by the buildfarm. Then yeah, it's
> pretty likely it was simply never run.

IIRC, the reason those tests aren't run by default is that they're
pointless unless executed in a hot-standby configuration.  I'm inclined to
think we should remove them from src/test/regress altogether, and instead
put equivalent checks (for any not-already-covered case) into one of the
TAP tests that sets up such a configuration.  src/test/recovery is
probably the right home.

            regards, tom lane


Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

From
Alexander Lakhin
Date:
09.11.2018 17:48, Magnus Hagander wrote:
On Fri, Nov 9, 2018 at 2:25 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
We saw a variant of this problem on appveyor (a Windows build-bot)
when testing Daniel's patch to add an optional message (search for
"timing"), and it was fixed as part of that patch, for the new code in
that patch:

https://www.postgresql.org/message-id/flat/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se

Perhaps other pre-existing tests need similar treatment?

Ah yes, that seems to be the same thing, and yes that seem like a reasonalbe solution. So something like:
+select case
+       when pg_cancel_backend(pg_backend_pid())
+       then pg_sleep(60)
+end;
 
Alexander, can you check to see if making that change solves the issue on your machine?
Yes, after applying the attached patch, the issue is gone.

Best regards,
Alexander
Attachment

Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

From
Alexander Lakhin
Date:
09.11.2018 17:49, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Fri, Nov 9, 2018 at 2:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>>> On Thu, Nov 08, 2018 at 11:31:41AM +0100, Magnus Hagander wrote:
>>>> ... why is this problem only showing up now?
>> Oh, i didn't realize that wasn't run by the buildfarm. Then yeah, it's
>> pretty likely it was simply never run.
> IIRC, the reason those tests aren't run by default is that they're
> pointless unless executed in a hot-standby configuration.  I'm inclined to
> think we should remove them from src/test/regress altogether, and instead
> put equivalent checks (for any not-already-covered case) into one of the
> TAP tests that sets up such a configuration.  src/test/recovery is
> probably the right home.
I tried to use the 'make standycheck' approach for two reasons. First,
it's documented at https://www.postgresql.org/docs/11/regress-run.html
Second, it allows to test a replication between different minor versions
(after some setup).
Will we still have such a possibility, if the tests will be removed?

Best regards,
Alexander



Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns truesporadically

From
Michael Paquier
Date:
On Mon, Nov 12, 2018 at 07:01:21AM +0300, Alexander Lakhin wrote:
> I tried to use the 'make standycheck' approach for two reasons. First,
> it's documented at https://www.postgresql.org/docs/11/regress-run.html
> Second, it allows to test a replication between different minor versions
> (after some setup).
> Will we still have such a possibility, if the tests will be removed?

When doing an upgrade the standby needs to go first, but I can see your
point here.  If those tests get moved to a TAP test, then the same set
of binaries would be used for both the standby and the primary, so that
possibility would go away if you would want to set a primary with a
different minor version than the standby.
--
Michael

Attachment