Thread: [HACKERS] pg_terminate_backend can terminate background workers andautovacuum launchers

Hi,

I have found that we can cancel/terminate autovacuum launchers and
background worker processes by pg_cancel/terminate_backend function.
I'm wondering this behavior is not expected and if not I want to fix it.


The current pg_stat_activity shows background workers and autovacuum
lancher as below. It made me come up with the question.

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  |     wait_event      |    backend_type     
-------+---------------------+---------------------
 30902 | LogicalLauncherMain | background worker
 30900 | AutoVacuumMain      | autovacuum launcher
 30923 |                     | client backend
 30898 | BgWriterMain        | background writer
 30897 | CheckpointerMain    | checkpointer
 30899 | WalWriterMain       | walwriter
(6 rows)

We cannot use pg_terminate/cancel_backend for most processes
except client backends. For example, when I tried to terminate
the background writer, I got a warning and failed.

postgres=# select pg_terminate_backend(30899);
WARNING:  PID 30899 is not a PostgreSQL server process
 pg_terminate_backend 
----------------------
 f
(1 row)

However, we can terminate background workers by pg_terminate_backend.
In the following example, I terminated the logical replication launcher,
and this process did not appear again[1]. 

postgres=# select pg_terminate_backend(30902);
 pg_terminate_backend 
----------------------
 t
(1 row)

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  |    wait_event     |    backend_type     
-------+-------------------+---------------------
 30900 | AutoVacuumMain    | autovacuum launcher
 30923 |                   | client backend
 30898 | BgWriterHibernate | background writer
 30897 | CheckpointerMain  | checkpointer
 30899 | WalWriterMain     | walwriter
(5 rows)

Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
but a new process is restarted by postmaster in this case.[2]

postgres=# select pg_terminate_backend(30900);
 pg_terminate_backend 
----------------------
 t
(1 row)

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  |    wait_event     |    backend_type     
-------+-------------------+---------------------
 32483 | AutoVacuumMain    | autovacuum launcher
 30923 |                   | client backend
 30898 | BgWriterHibernate | background writer
 30897 | CheckpointerMain  | checkpointer
 30899 | WalWriterMain     | walwriter
(5 rows)


My question is whether the behavior of pg_terminate/cancel_backend is
expected. If these functions should succeed only for client backends,
we need to fix the behavior. Attached is a patch to fix it in that case.

In my patch, process type is checked in pg_signal_backend(), and if it is
background worker or autovacuum launcher then throw a warning and fail. 
BackendPidGetProc() returns valid PGPROC for proccesses that are initialized
by PostgresInit(), and, in my understand, all such proccess are client
backends, background workers, and autovacuum launcher. So, if this is
neither background woker nor autovacuum launcher, this should be
a normal client backend. For this check, I added a new field,
isAutoVacuumLauncher, to PGPROC.

Any comments would be appreciated.

-----
[1]
AFAIK, we have to restart the server to enable logical replication after this.
I'm not sure this is expected, but I found the following comment in
ProcessInterrupts(). Does "can be stopped at any time" mean that we can
drop this process completely?

2852         else if (IsLogicalLauncher())
2853         {
2854             ereport(DEBUG1,
2855                     (errmsg("logical replication launcher shutting down")));
2856 
2857             /* The logical replication launcher can be stopped at any time. */
2858             proc_exit(0);
2859         }

When the logical replication launcher receive SIGTERM, this exits with exitstatus 0,
so this is not restarted by the postmaster.

[2]
On the other hand, when we use pg_cancel_backend for autovacuum launcher,
it causes the following error. I'll report the detail in another thread.

 ERROR:  can't attach the same segment more than once

-----

Regards,

-- 
Yugo Nagata <nagata@sraoss.co.jp>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> I have found that we can cancel/terminate autovacuum launchers and
> background worker processes by pg_cancel/terminate_backend function.
> I'm wondering this behavior is not expected and if not I want to fix it.

I think it is expected.  Even if we blocked it, those processes have
to cope gracefully with SIGTERM, because anyone with access to the OS
user can kill them that way by hand.

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1].
>
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend
> ----------------------
>  t
> (1 row)

That seems to be a bug in logical replication.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]
>
> postgres=# select pg_terminate_backend(30900);
>  pg_terminate_backend
> ----------------------
>  t
> (1 row)

That is as I would expect.

> [2]
> On the other hand, when we use pg_cancel_backend for autovacuum launcher,
> it causes the following error. I'll report the detail in another thread.
>
>  ERROR:  can't attach the same segment more than once

I think that's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Yugo Nagata wrote:

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1]. 
> 
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend 
> ----------------------
>  t
> (1 row)

I think failing to restart the replication launcher after it stops is
probably a bug, and should be fixed by having postmaster start another
one if it dies.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]

Yeah, this is operating as intended.  You can turn autovacuum off and
the launcher should go away, and turn it back on and launcher should
start.  So we expect the autovac launcher to respond to signals.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, 21 Jun 2017 11:04:34 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > I have found that we can cancel/terminate autovacuum launchers and
> > background worker processes by pg_cancel/terminate_backend function.
> > I'm wondering this behavior is not expected and if not I want to fix it.
> 
> I think it is expected.  Even if we blocked it, those processes have
> to cope gracefully with SIGTERM, because anyone with access to the OS
> user can kill them that way by hand.

I agree that we can kill theses processes by the OS command. However,
It seems to me that pg_{cancel,terminate}_backend don't need to be able to
kill processes except for client backends because we can do same thing by
the OS command if necessary, and acutually these functions cannot kill
most other processes, for example, background writer. Are the autovacuum
launcher and background worker special for these functions?

> 
> > However, we can terminate background workers by pg_terminate_backend.
> > In the following example, I terminated the logical replication launcher,
> > and this process did not appear again[1].
> >
> > postgres=# select pg_terminate_backend(30902);
> >  pg_terminate_backend
> > ----------------------
> >  t
> > (1 row)
> 
> That seems to be a bug in logical replication.
> 
> > Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> > but a new process is restarted by postmaster in this case.[2]
> >
> > postgres=# select pg_terminate_backend(30900);
> >  pg_terminate_backend
> > ----------------------
> >  t
> > (1 row)
> 
> That is as I would expect.
> 
> > [2]
> > On the other hand, when we use pg_cancel_backend for autovacuum launcher,
> > it causes the following error. I'll report the detail in another thread.
> >
> >  ERROR:  can't attach the same segment more than once
> 
> I think that's a bug.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: [HACKERS] pg_terminate_backend can terminate background workersand autovacuum launchers

From
andres@anarazel.de (Andres Freund)
Date:
On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
> On Wed, 21 Jun 2017 11:04:34 -0400
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > I have found that we can cancel/terminate autovacuum launchers and
> > > background worker processes by pg_cancel/terminate_backend function.
> > > I'm wondering this behavior is not expected and if not I want to fix it.
> > 
> > I think it is expected.  Even if we blocked it, those processes have
> > to cope gracefully with SIGTERM, because anyone with access to the OS
> > user can kill them that way by hand.
> 
> I agree that we can kill theses processes by the OS command. However,
> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
> kill processes except for client backends because we can do same thing by
> the OS command if necessary, and acutually these functions cannot kill
> most other processes, for example, background writer. Are the autovacuum
> launcher and background worker special for these functions?

I strongly disagree with this - I think it's quite useful to be able to
kill things via SQL that can hold lock on database objects.   I'm not
seeing which problem would be solved by prohibiting this?

- Andres



On Thu, Jun 22, 2017 at 11:52 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
>> I agree that we can kill theses processes by the OS command. However,
>> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
>> kill processes except for client backends because we can do same thing by
>> the OS command if necessary, and acutually these functions cannot kill
>> most other processes, for example, background writer. Are the autovacuum
>> launcher and background worker special for these functions?
>
> I strongly disagree with this - I think it's quite useful to be able to
> kill things via SQL that can hold lock on database objects.   I'm not
> seeing which problem would be solved by prohibiting this?

+1. Controlling them as of now is useful, particularly now that all
processes show wait events, even auxiliary ones (though not
signal-able). Different thought, but I'd love to see a SQL function
that allows triggering SIGHUP on a specific process, like an
autovacuum worker to change its cost parameters. There is
pg_reload_conf() to do so but that's system-wide.
-- 
Michael



On Thu, 22 Jun 2017 12:05:19 +0900
Michael Paquier <michael.paquier@gmail.com> wrote:

> On Thu, Jun 22, 2017 at 11:52 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
> >> I agree that we can kill theses processes by the OS command. However,
> >> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
> >> kill processes except for client backends because we can do same thing by
> >> the OS command if necessary, and acutually these functions cannot kill
> >> most other processes, for example, background writer. Are the autovacuum
> >> launcher and background worker special for these functions?
> >
> > I strongly disagree with this - I think it's quite useful to be able to
> > kill things via SQL that can hold lock on database objects.   I'm not
> > seeing which problem would be solved by prohibiting this?
> 
> +1. Controlling them as of now is useful, particularly now that all
> processes show wait events, even auxiliary ones (though not

I got it. I agree that the current behaviour and it isn't worth changint it.
In my understand, processes that the functions can kill (client backends,
background workers, autovacuum processes) are ones that can hold lock
on database objects.

> signal-able). Different thought, but I'd love to see a SQL function
> that allows triggering SIGHUP on a specific process, like an
> autovacuum worker to change its cost parameters. There is
> pg_reload_conf() to do so but that's system-wide.

For example, is that like this?

=# alter system set autovacuum_vacuum_cost_delay to 10;
=# select pg_reload_conf(<PID of autovacuum worker)>);
=# alter system reset autovacuum_vacuum_cost_delay;

> -- 
> Michael


-- 
Yugo Nagata <nagata@sraoss.co.jp>



On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> On Thu, 22 Jun 2017 12:05:19 +0900
> Michael Paquier <michael.paquier@gmail.com> wrote:
>> signal-able). Different thought, but I'd love to see a SQL function
>> that allows triggering SIGHUP on a specific process, like an
>> autovacuum worker to change its cost parameters. There is
>> pg_reload_conf() to do so but that's system-wide.
>
> For example, is that like this?
>
> =# alter system set autovacuum_vacuum_cost_delay to 10;
> =# select pg_reload_conf(<PID of autovacuum worker)>);
> =# alter system reset autovacuum_vacuum_cost_delay;

No need to reset the parameter afterwards as this makes it sensible to
updates afterwards, but you have the idea. Note that this is rather
recent, as autovacuum listens to SIGHUP only since a75fb9b3.
-- 
Michael



On Thu, 22 Jun 2017 14:08:30 +0900
Michael Paquier <michael.paquier@gmail.com> wrote:

> On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > On Thu, 22 Jun 2017 12:05:19 +0900
> > Michael Paquier <michael.paquier@gmail.com> wrote:
> >> signal-able). Different thought, but I'd love to see a SQL function
> >> that allows triggering SIGHUP on a specific process, like an
> >> autovacuum worker to change its cost parameters. There is
> >> pg_reload_conf() to do so but that's system-wide.
> >
> > For example, is that like this?
> >
> > =# alter system set autovacuum_vacuum_cost_delay to 10;
> > =# select pg_reload_conf(<PID of autovacuum worker)>);
> > =# alter system reset autovacuum_vacuum_cost_delay;
> 
> No need to reset the parameter afterwards as this makes it sensible to
> updates afterwards, but you have the idea. Note that this is rather
> recent, as autovacuum listens to SIGHUP only since a75fb9b3.

I tried to make it. Patch attached. 

It is easy and simple. Although I haven't tried for autovacuum worker,
I confirmed I can change other process' parameters without affecting others.
Do you want this in PG?


[session A (PID:18375)]
=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
------------------------------
 20ms
(1 row)


[session B]
postgres=# alter system set autovacuum_vacuum_cost_delay to 10;
ALTER SYSTEM
postgres=# select pg_reload_conf(18375);
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
------------------------------
 20ms
(1 row)


[session A again]
postgres=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
------------------------------
 10ms
(1 row)



> -- 
> Michael


-- 
Yugo Nagata <nagata@sraoss.co.jp>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
Yugo Nagata wrote:

> I tried to make it. Patch attached. 
> 
> It is easy and simple. Although I haven't tried for autovacuum worker,
> I confirmed I can change other process' parameters without affecting others.
> Do you want this in PG?

One advantage of this gadget is that you can signal backends that you
own without being superuser, so +1 for the general idea.  (Please do
create a fresh thread so that this can be appropriately linked to in
commitfest app, though.)

You need a much bigger patch for the autovacuum worker.  The use case I
had in mind is that you have a maintenance window and can run fast
vacuuming during it, but if the table is large and vacuum can't finish
within that window then you want vacuum to slow down, without stopping
it completely.  But implementing this requires juggling the
stdVacuumCostDelay/Limit values during the reload, which are currently
read at start of vacuuming only; the working values are overwritten from
those during a rebalance.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Sat, Jun 24, 2017 at 5:07 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Yugo Nagata wrote:
>
>> I tried to make it. Patch attached.
>>
>> It is easy and simple. Although I haven't tried for autovacuum worker,
>> I confirmed I can change other process' parameters without affecting others.
>> Do you want this in PG?

Just browsing the patch...

+    if (r == SIGNAL_BACKEND_NOSUPERUSER)
+        ereport(WARNING,
+                (errmsg("must be a superuser to terminate superuser
process")));
+
+    if (r == SIGNAL_BACKEND_NOPERMISSION)
+        ereport(WARNING,
+                 (errmsg("must be a member of the role whose process
is being terminated or member of pg_signal_backend")));
Both messages are incorrect. This is not trying to terminate a process.

+Datum
+pg_reload_conf_pid(PG_FUNCTION_ARGS)
I think that the naming is closer to pg_reload_backend.

Documentation is needed as well.

> One advantage of this gadget is that you can signal backends that you
> own without being superuser, so +1 for the general idea.  (Please do
> create a fresh thread so that this can be appropriately linked to in
> commitfest app, though.)

That would be nice.

> You need a much bigger patch for the autovacuum worker.  The use case I
> had in mind is that you have a maintenance window and can run fast
> vacuuming during it, but if the table is large and vacuum can't finish
> within that window then you want vacuum to slow down, without stopping
> it completely.  But implementing this requires juggling the
> stdVacuumCostDelay/Limit values during the reload, which are currently
> read at start of vacuuming only; the working values are overwritten from
> those during a rebalance.

Yeah, that's independent from the patch above.
-- 
Michael



Alvaro, all,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Yugo Nagata wrote:
>
> > I tried to make it. Patch attached.
> >
> > It is easy and simple. Although I haven't tried for autovacuum worker,
> > I confirmed I can change other process' parameters without affecting others.
> > Do you want this in PG?
>
> One advantage of this gadget is that you can signal backends that you
> own without being superuser, so +1 for the general idea.  (Please do
> create a fresh thread so that this can be appropriately linked to in
> commitfest app, though.)

Well, that wouldn't quite work with the proposed patch because the
proposed patch REVOKE's execute from public...

I'm trying to figure out how it's actually useful to be able to signal a
backend that you own about a config change that you *aren't* able to
make without being a superuser..  Now, if you could tell the other
backend to use a new value for some USERSET GUC, then *that* would be
useful and interesting.

I haven't got any particularly great ideas for how to make that happen
though.

> You need a much bigger patch for the autovacuum worker.  The use case I
> had in mind is that you have a maintenance window and can run fast
> vacuuming during it, but if the table is large and vacuum can't finish
> within that window then you want vacuum to slow down, without stopping
> it completely.  But implementing this requires juggling the
> stdVacuumCostDelay/Limit values during the reload, which are currently
> read at start of vacuuming only; the working values are overwritten from
> those during a rebalance.

Being able to change those values during a vacuuming run would certainly
be useful, I've had cases where I would have liked to have been able to
do just exactly that.  That's largely independent of this though.

Thanks!

Stephen

On Sat, 24 Jun 2017 08:09:52 +0900
Michael Paquier <michael.paquier@gmail.com> wrote:

> On Sat, Jun 24, 2017 at 5:07 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Yugo Nagata wrote:
> >
> >> I tried to make it. Patch attached.
> >>
> >> It is easy and simple. Although I haven't tried for autovacuum worker,
> >> I confirmed I can change other process' parameters without affecting others.
> >> Do you want this in PG?
> 
> Just browsing the patch...
> 
> +    if (r == SIGNAL_BACKEND_NOSUPERUSER)
> +        ereport(WARNING,
> +                (errmsg("must be a superuser to terminate superuser
> process")));
> +
> +    if (r == SIGNAL_BACKEND_NOPERMISSION)
> +        ereport(WARNING,
> +                 (errmsg("must be a member of the role whose process
> is being terminated or member of pg_signal_backend")));
> Both messages are incorrect. This is not trying to terminate a process.

I'll fix this.

> 
> +Datum
> +pg_reload_conf_pid(PG_FUNCTION_ARGS)
> I think that the naming is closer to pg_reload_backend.
> 
> Documentation is needed as well.

Agree with pg_reload_backend and I'll write the documetation.

> 
> > One advantage of this gadget is that you can signal backends that you
> > own without being superuser, so +1 for the general idea.  (Please do
> > create a fresh thread so that this can be appropriately linked to in
> > commitfest app, though.)
> 
> That would be nice.

Sure. I'll create a new thread and attach the new patch to it.

> 
> > You need a much bigger patch for the autovacuum worker.  The use case I
> > had in mind is that you have a maintenance window and can run fast
> > vacuuming during it, but if the table is large and vacuum can't finish
> > within that window then you want vacuum to slow down, without stopping
> > it completely.  But implementing this requires juggling the
> > stdVacuumCostDelay/Limit values during the reload, which are currently
> > read at start of vacuuming only; the working values are overwritten from
> > those during a rebalance.
> 
> Yeah, that's independent from the patch above.

Agree. It would be another feature from pg_reload_backend and
I think it would be implmented as another patch.



> -- 
> Michael
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Yugo Nagata <nagata@sraoss.co.jp>



On Fri, 23 Jun 2017 19:43:35 -0400
Stephen Frost <sfrost@snowman.net> wrote:

> Alvaro, all,
> 
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> > Yugo Nagata wrote:
> > 
> > > I tried to make it. Patch attached. 
> > > 
> > > It is easy and simple. Although I haven't tried for autovacuum worker,
> > > I confirmed I can change other process' parameters without affecting others.
> > > Do you want this in PG?
> > 
> > One advantage of this gadget is that you can signal backends that you
> > own without being superuser, so +1 for the general idea.  (Please do
> > create a fresh thread so that this can be appropriately linked to in
> > commitfest app, though.)
> 
> Well, that wouldn't quite work with the proposed patch because the
> proposed patch REVOKE's execute from public...

Yes. It is intentional to revoke execute from public because we need
to change configuration before signaling other backend and it needs
superuser privilege.
> I'm trying to figure out how it's actually useful to be able to signal a
> backend that you own about a config change that you *aren't* able to
> make without being a superuser..  Now, if you could tell the other
> backend to use a new value for some USERSET GUC, then *that* would be
> useful and interesting.
> 
> I haven't got any particularly great ideas for how to make that happen
> though.
> 
> > You need a much bigger patch for the autovacuum worker.  The use case I
> > had in mind is that you have a maintenance window and can run fast
> > vacuuming during it, but if the table is large and vacuum can't finish
> > within that window then you want vacuum to slow down, without stopping
> > it completely.  But implementing this requires juggling the
> > stdVacuumCostDelay/Limit values during the reload, which are currently
> > read at start of vacuuming only; the working values are overwritten from
> > those during a rebalance.
> 
> Being able to change those values during a vacuuming run would certainly
> be useful, I've had cases where I would have liked to have been able to
> do just exactly that.  That's largely independent of this though.
> 
> Thanks!
> 
> Stephen


-- 
Yugo Nagata <nagata@sraoss.co.jp>