Thread: Autovacuum launcher

Autovacuum launcher

From
Alvaro Herrera
Date:
[resending, with the patch gzipped, as the previous one seems to be lost
-- I mistakenly sent it to -hackers instead of -patches]

Hi,

Here's the autovacuum launcher patch I'm considering for inclusion.

Possibly controversial stuff:

1. I changed InitPostgres to be able to get a database by OID.  This is
better than having to convert the OID to name in the launcher by reading
the pg_database flat file, and then back to OID in InitPostgres by
reading same.  I had to add a FindMyDatabaseByOid to this effect.
In autovacuum mode, InitPostgres() passes the database name back, so
that the autovacuum worker process doesn't have to compute it
separately.

2. I treat autovac workers as backends in Postmaster, and put them in
the regular list of backends.  This makes them respond to
SignalChildren() cleanly; and it means they don't need any special code
in case of shutdown, etc, because they are managed just like any other
backend.  I intended to add an is_autovacuum flag to the Backend struct,
but it turns out to be unnecessary.

Not so controversial, but maybe of note:

1. I removed all frammishes to slow down in case of failure (start_time,
stop_time, etc).

2. Autovacuum launcher is a dummy process, so it's treated specially in
postmaster.

3. Autovacuum now has a shared memory area.  Currently it's fairly small
and contains a single OID and a PID, but the idea is to extend it to
pass more info to the workers in future patches.


Other than that it seems fairly vanilla to me.


One thing I noticed is that when autovac is off, it may take up to 60
seconds to process a signal passed down via SendPostmasterSignal().
I assume this is due to our system calls being restartable, i.e. it's
restarted by the operating system after the signal is received (if this
is not the case please let me know).  This should not be much of a
problem, except if the postmaster is idle for a long time (for example
because all connections are served from a pool) and a lot of
transactions are consumed in the 60 sec period.  Not sure if I should
worry too much about this.


Comments are much appreciated.  I intend to apply this in a couple of
days unless there are objections.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: Autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Here's the autovacuum launcher patch I'm considering for inclusion.

This part is very seriously broken:

diff -c -p -r1.33 lwlock.h
*** src/include/storage/lwlock.h    5 Jan 2007 22:19:58 -0000    1.33
--- src/include/storage/lwlock.h    13 Feb 2007 16:58:41 -0000
*************** typedef enum LWLockId
*** 62,67 ****
--- 62,68 ----
      BtreeVacuumLock,
      AddinShmemInitLock,
      FirstBufMappingLock,
+     AutovacuumLock,
      FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,

      /* must be last except for MaxDynamicLWLock: */

I'm surprised it got through your testing at all, with the autovacuum
lock conflicting with bufmgr.

Making InitPostgres's call API vary depending on
IsAutoVacuumWorkerProcess seems really ugly, and unnecessary.
Why not just test for dbname == NULL or some other convention
expressed by the arguments themselves?

> One thing I noticed is that when autovac is off, it may take up to 60
> seconds to process a signal passed down via SendPostmasterSignal().

That's because if the signal arrives while postmaster has signals
blocked, it'll be serviced at PG_SETMASK(&UnBlockSig), and then (if no
connection requests are arriving) the select() timeout delay will elapse
before the main loop notices the flag.  You probably want to reconsider
the decision to not have sigusr1_handler() launch the child immediately.
The current coding is the way it is because an autovac child can itself
send the signal and we want that to be interpreted as "start another one
after the current one finishes".  In the launcher-and-workers world,
though, I'm not convinced we need any such behavior.

One problem I see with this is you've removed the code that sent SIGINT
to an active autovac process during "smart shutdown".  This seems a bad
idea because now shutdown response will be limited by the slowest
autovac, and that will tempt DBAs with itchy trigger fingers to do
something stupid.  It'd be a good idea to extend the Backend struct
to mark which backends are autovacs, so that you can preserve the
behavior of sending them SIGINTs at shutdown.

            regards, tom lane

Re: Autovacuum launcher

From
Alvaro Herrera
Date:
I've fixed all other problems according to suggestions, including adding
a SignalSomeChildren(int signal, bool only_autovac) function to
postmaster so that it can shut autovac workers down in case of smart
shutdown.

Tom Lane wrote:

> Making InitPostgres's call API vary depending on
> IsAutoVacuumWorkerProcess seems really ugly, and unnecessary.
> Why not just test for dbname == NULL or some other convention
> expressed by the arguments themselves?

Hmm, yeah.  Modified version below.  In BootstrapMain, it surprises me
that the dbname is getting computed, and seems to be a required
parameter, when it doesn't seem to be used at all.  That code needs a
serious refactoring ...

/* --------------------------------
 * InitPostgres
 *      Initialize POSTGRES.
 *
 * The database can be specified by name, using the in_dbname parameter, or by
 * OID, using the dboid parameter.  In the latter case, the computed database
 * name is passed out to the caller as a palloc'ed string in out_dbname.
 *
 * In bootstrap mode no parameters are used.
 *
 * The return value indicates whether the userID is a superuser.  (That
 * can only be tested inside a transaction, so we want to do it during
 * the startup transaction rather than doing a separate one in postgres.c.)
 *
 * As of PostgreSQL 8.2, we expect InitProcess() was already called, so we
 * already have a PGPROC struct ... but it's not filled in yet.
 *
 * Note:
 *      Be very careful with the order of calls in the InitPostgres function.
 * --------------------------------
 */
bool
InitPostgres(const char *in_dbname, Oid dboid, const char *username,
             char **out_dbname)
{
    bool        bootstrap = IsBootstrapProcessingMode();
    bool        autovacuum = IsAutoVacuumWorkerProcess();
    bool        am_superuser;
    char       *fullpath;
    char        dbname[NAMEDATALEN];

    /*
     * Set up the global variables holding database id and path.  But note we
     * won't actually try to touch the database just yet.
     *
     * We take a shortcut in the bootstrap case, otherwise we have to look up
     * the db name in pg_database.
     */
    if (bootstrap)
    {
        MyDatabaseId = TemplateDbOid;
        MyDatabaseTableSpace = DEFAULTTABLESPACE_OID;
    }
    else
    {
        /*
         * Find tablespace of the database we're about to open. Since we're not
         * yet up and running we have to use one of the hackish FindMyDatabase
         * variants, which look in the flat-file copy of pg_database.
         *
         * If the in_dbname param is NULL, lookup database by OID.
         */
        if (in_dbname == NULL)
        {
            if (!FindMyDatabaseByOid(dboid, dbname, &MyDatabaseTableSpace))
                ereport(FATAL,
                        (errcode(ERRCODE_UNDEFINED_DATABASE),
                         errmsg("database %u does not exist", dboid)));
            MyDatabaseId = dboid;
            /* pass the database name to the caller */
            *out_dbname = pstrdup(dbname);
        }
        else
        {
            if (!FindMyDatabase(in_dbname, &MyDatabaseId, &MyDatabaseTableSpace))
                ereport(FATAL,
                        (errcode(ERRCODE_UNDEFINED_DATABASE),
                         errmsg("database \"%s\" does not exist",
                                in_dbname)));
            /* our database name is gotten from the caller */
            strlcpy(dbname, in_dbname, NAMEDATALEN);
        }
    }

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: Autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane wrote:

I forgot to comment:

> This part is very seriously broken:
>
> diff -c -p -r1.33 lwlock.h
> *** src/include/storage/lwlock.h    5 Jan 2007 22:19:58 -0000    1.33
> --- src/include/storage/lwlock.h    13 Feb 2007 16:58:41 -0000
> *************** typedef enum LWLockId
> *** 62,67 ****
> --- 62,68 ----
>       BtreeVacuumLock,
>       AddinShmemInitLock,
>       FirstBufMappingLock,
> +     AutovacuumLock,
>       FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
>
>       /* must be last except for MaxDynamicLWLock: */
>
> I'm surprised it got through your testing at all, with the autovacuum
> lock conflicting with bufmgr.

I didn't look closely enough to notice the arithmetic.  I think the only
reason this worked at all was because the autovacuum lock was being held
for very short periods of time, always for assigning or reading some
variable.

I think a warning comment is warranted here -- will include it when I
commit the patch.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think a warning comment is warranted here -- will include it when I
> commit the patch.

I was thinking the same, but didn't want to create a merge problem for
you.  Maybe "Individual lock IDs end here" or some such?

            regards, tom lane