Thread: Cancel autovacuum conflicting with DROP TABLE

Cancel autovacuum conflicting with DROP TABLE

From
ITAGAKI Takahiro
Date:
Here is a patch that cancels autovacuum workers conflicting with
DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php

Before backends drop a table, they search autovacuum workers that
are running on the table. If found, they send SIGINT signal to
the autovacuum to avoid waiting for the worthless vacuum.

When an autovacuum worker receives SIGINT, it skips only the vacuuming
table and continues the remaining jobs. Now SIGINT and SIGTERM have
different meanings and their logs are changed.

  SIGINT -- Cancel the current job.
      DEBUG: autovacuum on relation <schema>.<table> is canceled
  SIGTERM -- Cancel all jobs.
      FATAL: terminating autovacuum due to administrator command

We can see the second SIGTERM log on shutdown and DROP DATABASE.
In such case, we send SIGTERM to autovacuums instead of SIGINT.
I'd like to bring down the error level to WARNING or less actually,
but I have no idea to do it because canceling queries and error levels
are tightly coupled in the present implementation.
Instead, I added the special FATAL message for autovacuum workers
so that we can distinguish the lines are ignorable.

Comments welcome.

---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment

Re: Cancel autovacuum conflicting with DROP TABLE

From
ITAGAKI Takahiro
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:

> Here is a patch that cancels autovacuum workers conflicting with
> DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php

I made an adjustment for the latest 'more autovacuum fixes' patch.
http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php

Now, autovacuum workers handles ProcDie signals using ERROR
instead of FATAL. The exception is caught by the worker and
converted to the following logs.

  SIGINT -- Cancel the current job.
      LOG: autovacuum on <db>.<schema>.<table> is canceled
  SIGTERM -- Cancel all jobs.
      LOG: autovacuum on <db> is canceled

We are planning to ship 8.3 with autovacuum=on, so users will be
more likely to see conflicts between autovacuum and their commands.
This makes autovacuum more gentle.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: Cancel autovacuum conflicting with DROP TABLE

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> > Here is a patch that cancels autovacuum workers conflicting with
> > DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
> > http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php
>
> I made an adjustment for the latest 'more autovacuum fixes' patch.
> http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php
>
> Now, autovacuum workers handles ProcDie signals using ERROR
> instead of FATAL. The exception is caught by the worker and
> converted to the following logs.
>
>   SIGINT -- Cancel the current job.
>       LOG: autovacuum on <db>.<schema>.<table> is canceled
>   SIGTERM -- Cancel all jobs.
>       LOG: autovacuum on <db> is canceled

Thanks for the patch.  I think there are actually three patches in here:

1. changing SIGINT so that it cancels the current table instead of
shutting down the entire worker.

2. changing DROP TABLE and TRUNCATE so that they cancel an autovac
worker by sending SIGINT.

3. change the interrupt code so that autovac workers are terminated with
ERROR instead of FATAL, knowing that the final outcome is the same.  If
I'm not mistaken the only point of the change is to be able to change
the error message, is that correct?


I don't feel very much comfortable with the patch (3).  I would prefer
that we keep errcode FATAL and select a different error message in
ProcessInterrupts instead.  I don't see the point in complicating the
sigjmp target without need.

I agree with the (2) change.

The change in (1) I don't feel comfortable with commenting.  It seems
strange to me, and although it seems like it would be safe, I cannot
help having a strange feeling about it.  I'll try to digest it a bit
more.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Cancel autovacuum conflicting with DROP TABLE

From
ITAGAKI Takahiro
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> 1. changing SIGINT so that it cancels the current table instead of
> shutting down the entire worker.
>
> 2. changing DROP TABLE and TRUNCATE so that they cancel an autovac
> worker by sending SIGINT.

Quite so.

> 3. change the interrupt code so that autovac workers are terminated with
> ERROR instead of FATAL, knowing that the final outcome is the same.  If
> I'm not mistaken the only point of the change is to be able to change
> the error message, is that correct?

Yes, I used ERROR instead of FATAL just for changing the message.
In addition, the actual message level will be LOG, instead of FATAL.

> I don't feel very much comfortable with the patch (3).  I would prefer
> that we keep errcode FATAL and select a different error message in
> ProcessInterrupts instead.  I don't see the point in complicating the
> sigjmp target without need.

My first patch did so and I changed it in the second patch because
I feel the FATAL entry in syslog is too obtrusive; Internal termination
of autovacuum workers in this way is ignorable for normal users.
However, I'm sure that my complaint is not so important. Please don't
apply thie part of the patch if you don't like it.


> I agree with the (2) change.
>
> The change in (1) I don't feel comfortable with commenting.  It seems
> strange to me, and although it seems like it would be safe, I cannot
> help having a strange feeling about it.  I'll try to digest it a bit
> more.

Thanks. I forgot to adjust comments in the code, sorry.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: Cancel autovacuum conflicting with DROP TABLE

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
>
> Alvaro Herrera <alvherre@commandprompt.com> wrote:
>
> > 1. changing SIGINT so that it cancels the current table instead of
> > shutting down the entire worker.

I applied this part of the patch, thanks.

In passing I noticed that apparently we are leaking memory, because the
vacuum memory context is created with a parent of PortalContext, which
ISTM to be NULL in autovacuum.  So when we cancel the vacuuming work, we
never delete or reset that context.

I think what we should be doing is creating a context to act as
PortalContext, and reset it after each vacuuming operation.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")