Thread: Cancel autovacuum conflicting with DROP TABLE
Here is a patch that cancels autovacuum workers conflicting with DROP TABLE, TRUNCATE and CLUSTER. It was discussed here: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php Before backends drop a table, they search autovacuum workers that are running on the table. If found, they send SIGINT signal to the autovacuum to avoid waiting for the worthless vacuum. When an autovacuum worker receives SIGINT, it skips only the vacuuming table and continues the remaining jobs. Now SIGINT and SIGTERM have different meanings and their logs are changed. SIGINT -- Cancel the current job. DEBUG: autovacuum on relation <schema>.<table> is canceled SIGTERM -- Cancel all jobs. FATAL: terminating autovacuum due to administrator command We can see the second SIGTERM log on shutdown and DROP DATABASE. In such case, we send SIGTERM to autovacuums instead of SIGINT. I'd like to bring down the error level to WARNING or less actually, but I have no idea to do it because canceling queries and error levels are tightly coupled in the present implementation. Instead, I added the special FATAL message for autovacuum workers so that we can distinguish the lines are ignorable. Comments welcome. --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > Here is a patch that cancels autovacuum workers conflicting with > DROP TABLE, TRUNCATE and CLUSTER. It was discussed here: > http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php I made an adjustment for the latest 'more autovacuum fixes' patch. http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php Now, autovacuum workers handles ProcDie signals using ERROR instead of FATAL. The exception is caught by the worker and converted to the following logs. SIGINT -- Cancel the current job. LOG: autovacuum on <db>.<schema>.<table> is canceled SIGTERM -- Cancel all jobs. LOG: autovacuum on <db> is canceled We are planning to ship 8.3 with autovacuum=on, so users will be more likely to see conflicts between autovacuum and their commands. This makes autovacuum more gentle. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
ITAGAKI Takahiro wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > > > Here is a patch that cancels autovacuum workers conflicting with > > DROP TABLE, TRUNCATE and CLUSTER. It was discussed here: > > http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php > > I made an adjustment for the latest 'more autovacuum fixes' patch. > http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php > > Now, autovacuum workers handles ProcDie signals using ERROR > instead of FATAL. The exception is caught by the worker and > converted to the following logs. > > SIGINT -- Cancel the current job. > LOG: autovacuum on <db>.<schema>.<table> is canceled > SIGTERM -- Cancel all jobs. > LOG: autovacuum on <db> is canceled Thanks for the patch. I think there are actually three patches in here: 1. changing SIGINT so that it cancels the current table instead of shutting down the entire worker. 2. changing DROP TABLE and TRUNCATE so that they cancel an autovac worker by sending SIGINT. 3. change the interrupt code so that autovac workers are terminated with ERROR instead of FATAL, knowing that the final outcome is the same. If I'm not mistaken the only point of the change is to be able to change the error message, is that correct? I don't feel very much comfortable with the patch (3). I would prefer that we keep errcode FATAL and select a different error message in ProcessInterrupts instead. I don't see the point in complicating the sigjmp target without need. I agree with the (2) change. The change in (1) I don't feel comfortable with commenting. It seems strange to me, and although it seems like it would be safe, I cannot help having a strange feeling about it. I'll try to digest it a bit more. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > 1. changing SIGINT so that it cancels the current table instead of > shutting down the entire worker. > > 2. changing DROP TABLE and TRUNCATE so that they cancel an autovac > worker by sending SIGINT. Quite so. > 3. change the interrupt code so that autovac workers are terminated with > ERROR instead of FATAL, knowing that the final outcome is the same. If > I'm not mistaken the only point of the change is to be able to change > the error message, is that correct? Yes, I used ERROR instead of FATAL just for changing the message. In addition, the actual message level will be LOG, instead of FATAL. > I don't feel very much comfortable with the patch (3). I would prefer > that we keep errcode FATAL and select a different error message in > ProcessInterrupts instead. I don't see the point in complicating the > sigjmp target without need. My first patch did so and I changed it in the second patch because I feel the FATAL entry in syslog is too obtrusive; Internal termination of autovacuum workers in this way is ignorable for normal users. However, I'm sure that my complaint is not so important. Please don't apply thie part of the patch if you don't like it. > I agree with the (2) change. > > The change in (1) I don't feel comfortable with commenting. It seems > strange to me, and although it seems like it would be safe, I cannot > help having a strange feeling about it. I'll try to digest it a bit > more. Thanks. I forgot to adjust comments in the code, sorry. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > 1. changing SIGINT so that it cancels the current table instead of > > shutting down the entire worker. I applied this part of the patch, thanks. In passing I noticed that apparently we are leaking memory, because the vacuum memory context is created with a parent of PortalContext, which ISTM to be NULL in autovacuum. So when we cancel the vacuuming work, we never delete or reset that context. I think what we should be doing is creating a context to act as PortalContext, and reset it after each vacuuming operation. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "El que vive para el futuro es un iluso, y el que vive para el pasado, un imbécil" (Luis Adler, "Los tripulantes de la noche")