more autovacuum fixes - Mailing list pgsql-patches

From Alvaro Herrera
Subject more autovacuum fixes
Date
Msg-id 20070619174921.GZ4265@alvh.no-ip.org
Whole thread Raw
Responses Re: more autovacuum fixes
Re: more autovacuum fixes
List pgsql-patches
I attach a patch to fix some issues in the autovac process handling
code.  Mainly, instead of rechecking periodically in the launcher
whether a worker was able to start or not, what we do is signal it from
the postmaster when it fails.

In order to do this I had to make the postmaster set a flag in shared
memory, much like the pmsignal.c code does.  According to previous
discussions this is an acceptable thing for postmaster to do.

I must admit I did not try to fill shared memory with garbage and see if
it caused a postmaster crash.  What I did test was creating a database,
making sure that the launcher was picking it up, and then delete the
database's directory.  This correctly causes a worker to log errors when
trying to connect, and it is handled well -- i.e. other databases are
not starved.  (One remaining problem I see is that a database in this
state, or similar, that fails to be vacuumed, *will* cause starvation if
in danger of Xid wraparound).

Another thing I noticed is that I can change the "autovacuum" parameter
in postgresql.conf and send SIGHUP to postmaster, which correctly
starts the launcher if it was previously disabled.  To make the shutdown
case act accordingly, I made the launcher check the "start daemon" flag
after rereading the config file, and quit if it's off.  I tried multiple
combinations of having it set and unset, changing
stats_collect_tuple_level and it works fine.

One problem with the patch is this (new code):

    bn = (Backend *) malloc(sizeof(Backend));
!   if (bn)
    {
!       bn->pid = StartAutoVacWorker();
!       bn->is_autovacuum = true;
!       /* we don't need a cancel key */

!       if (bn->pid > 0)
!       {
!           /* FIXME -- unchecked memory allocation here */
!           DLAddHead(BackendList, DLNewElem(bn));


If the palloc() inside DLNewElem fails, we will fail to report a "fork
failure" to the launcher.  I am not sure how serious this is.  One idea
that came to mind was using a PG_TRY block, sending the signal in the
CATCH block, and then rethrowing the exception.  Is this acceptable?


All in all, this patch increases the reliability of autovacuum, so I
intend to apply it shortly unless there are objections.  Further
improvements might come later as I become aware of possible fixes.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Attachment

pgsql-patches by date:

Previous
From: Michael Paesold
Date:
Subject: Re: WIP: rewrite numeric division
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] 'Waiting on lock'