Re: autovauum integration patch: Attempt #4 - Mailing list pgsql-patches

From Tom Lane
Subject Re: autovauum integration patch: Attempt #4
Date
Msg-id 16462.1091489171@sss.pgh.pa.us
Whole thread Raw
In response to Re: autovauum integration patch: Attempt #4  ("Matthew T. O'Connor" <matthew@zeut.net>)
Responses Re: autovauum integration patch: Attempt #4
Re: autovauum integration patch: Attempt #4
List pgsql-patches
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Please apply to CVS or tell me what I need to change to get it applied.

I looked over this patch (sorry for the delay), and found a number of
problems.

Bigger problems:

* I don't think you've thought through system shutdown at all.  The
postmaster code as given will not try to shutdown the autovac daemon
until the same time it shuts down the bgwriter, which is no good for
a couple of reasons:

(a) None of this code will trigger as long as there is any ordinary
backend running; like say the one(s) invoked by autovac itself.
This means that autovac-driven vacuuming is considered just as good a
reason to keep the system going as an ordinary user query, which I think
is not cool.  IMHO a SIGTERM to the postmaster should result in
canceling whatever autovacuum operation is currently running.

(b) It's really not cool that autovac isn't shut down *before* we start
shutting down bgwriter.  I think that it might not make too much
difference right at the moment, since the autovac daemon isn't actually
making any use of its connection to shared memory, but the moment that
autovac tries to do anything on its own behalf rather than via a backend
this is going to be a serious risk.  There can't be anything going on in
parallel with the shutdown checkpoint.

I think really we want autovac to shut down at the beginning of the
shutdown cycle, not the end, and not to start bgwriter shutdown until
autovac is gone.

* I hadn't quite focused before on the fact that this patch requires
statically binding frontend libpq into the backend.  There are a number
of issues with that, the most risky being that if libpq.so is compiled
thread-aware then it's going to create problems on platforms where
there's a difference between thread-aware and non-thread-aware C library
code.  Even without threading worries there are conflicts: if someone
calls a dllist.c routine, which instance will they get?  I'm also
concerned about the implications for modules like contrib/dblink, which
expect to load libpq.so dynamically.  There could be conflicts between
the dynamically linked libpq and the inbuilt one.

* AFAICS there is no provision for setting pg_user or pg_user_password,
which means that the daemon won't actually be able to connect in
non-TRUST environments.  I don't know what we do about this: putting
such a password in a GUC variable is right out (because any user could
read it) and putting it on the postmaster command line is no better
(anyone on the same machine can see it).  Right at the moment we do not
have any secure place for postmaster parameters.

Smaller problems:

* It still contains too much code copied-and-pasted from bgwriter,
such as
            ShutdownXLOG(0, 0);
            DumpFreeSpaceMap(0, 0);
autovac has *no* business doing that.  I don't have time to go through
it line-by-line to see what else shouldn't have been copied.

* The patch reverts some recent changes in postmaster.c (write_stderr
changes for instance).

* Emitting this warning on every single iteration of the postmaster idle
loop is excessive:
    elog(WARNING, "pg_autovacuum: autovac is enabled, but requires stats_row_level which is not enabled");
and this one even more so:
    if (!autovacuum_start_daemon)
        elog(DEBUG1, "pg_autovacuum: not enabled");

* Any message that's actually likely to be seen by users should be an
ereport, not elog.  elog is okay for debugging messages and can't-happen
errors, but not for messages that will be seen by ordinary users,
because there's no support for message translation in elog.

* I don't think you've actually thought very carefully about elog-based
error handling; for instance this bit:
            dbs->conn = db_connect(dbs);
            if (dbs->conn == NULL)
            {                    /* Serious problem: We can't connect to
                                 * template1 */
                elog(ERROR, "pg_autovacuum: Cannot connect to template1, exiting.");
                exit(1);
            }
Control won't make it to the exit(1)  (which is a darn good thing in
itself, seeing that you are connected to shared memory and hence had
better be using proc_exit()).  What *will* happen in the code as given
is that control will bounce back to the sigsetjmp, which will recover
and re-call AutoVacLoop, which would be okay except you just forgot
any open backend connections you have (plus leaked all the memory in
your datastructures).  Maybe it would be better if you did not have
an error catcher but just aborted the process on any serious error,
letting the postmaster start a new one.  (This ties into your FIXME
note about how the postmaster should react to autovac crashes...)

* autovacuum.h exports way too much stuff that is not relevant to
other modules.  (Rule #1: you don't declare static functions in a
header file; these *will* provoke warnings.)


I'm not sure what we do now.  I can't apply this in its current state,
and I do not have time to fix it.  I don't really want to push it in
and assume we can fix the problems during beta ...

            regards, tom lane

pgsql-patches by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: win32 readline
Next
From: Bruce Momjian
Date:
Subject: Re: Troff -ms output for psql