Thread: can't stop autovacuum by HUP'ing the server

can't stop autovacuum by HUP'ing the server

From
Dave Cramer
Date:
I'd like to stop autovac by changing the conf file then sending the  
server a HUP

This "appears" to work, the logs show autovac terminated by  
administrative command. Then a few minutes later I see a vacuum  
process spawned.

Is it possible that there are timers that aren't being properly  
terminated here ?

Dave


Re: can't stop autovacuum by HUP'ing the server

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> I'd like to stop autovac by changing the conf file then sending the  
> server a HUP

Uh ... why should that stop an autovac already in progress?  I'd
only expect it to affect future launches.
        regards, tom lane


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Hi Dave,

Dave Cramer wrote:
> I'd like to stop autovac by changing the conf file then sending the  
> server a HUP
>
> This "appears" to work, the logs show autovac terminated by  
> administrative command. Then a few minutes later I see a vacuum process 
> spawned.
>
> Is it possible that there are timers that aren't being properly  
> terminated here ?

I wonder whether there are tables that need an emergency vacuum (i.e.
they have reached the freeze horizon).  What version are you running
anyway?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
Dave Cramer
Date:
On 24-Aug-08, at 10:12 PM, Tom Lane wrote:

> Dave Cramer <pg@fastcrypt.com> writes:
>> I'd like to stop autovac by changing the conf file then sending the
>> server a HUP
>
> Uh ... why should that stop an autovac already in progress?  I'd
> only expect it to affect future launches.
>

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276]  LOG:  autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526]  LOG:  autovacuum launcher started
2008-08-25 04:00:01 EDT [21404]  FATAL:  terminating autovacuum  
process due to administrator command
2008-08-25 04:00:01 EDT [23418]  FATAL:  terminating autovacuum  
process due to administrator command
2008-08-25 04:00:01 EDT [21703]  FATAL:  terminating autovacuum  
process due to administrator command

So now that I've read this why would the launcher start again ?

Dave
>             regards, tom lane



Re: can't stop autovacuum by HUP'ing the server

From
Dave Cramer
Date:
On 24-Aug-08, at 10:17 PM, Alvaro Herrera wrote:

> Hi Dave,
>
> Dave Cramer wrote:
>> I'd like to stop autovac by changing the conf file then sending the
>> server a HUP
>>
>> This "appears" to work, the logs show autovac terminated by
>> administrative command. Then a few minutes later I see a vacuum  
>> process
>> spawned.
>>
>> Is it possible that there are timers that aren't being properly
>> terminated here ?
>
> I wonder whether there are tables that need an emergency vacuum (i.e.
> they have reached the freeze horizon).  What version are you running
> anyway?
>
version 8.3 and doubtful.

Since a number of people actually do this it would seem like a better  
way to temporarily suspend autovac should be on the todo ?

Dave
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support



Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Since a number of people actually do this it would seem like a better  
> way to temporarily suspend autovac should be on the todo ?

No -- what you are doing is documented to work.  However, it only stops
the autovac launcher, not a currently-running worker.  If this isn't
working, I'd like to know why.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Well, I go the extra mile and kill any remaing autovac procs
>
> Here are the logs
>
> 2008-08-25 04:00:01 EDT [32276]  LOG:  autovacuum launcher shutting down
> 2008-08-25 04:00:01 EDT [20526]  LOG:  autovacuum launcher started

What did you SIGHUP, the launcher or postmaster?  You need the latter.
The launcher should exit automatically at that time.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
Dave Cramer
Date:
On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

> Dave Cramer wrote:
>
>> Well, I go the extra mile and kill any remaing autovac procs
>>
>> Here are the logs
>>
>> 2008-08-25 04:00:01 EDT [32276]  LOG:  autovacuum launcher shutting  
>> down
>> 2008-08-25 04:00:01 EDT [20526]  LOG:  autovacuum launcher started
>
> What did you SIGHUP, the launcher or postmaster?  You need the latter.
> The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any  
autovacuums still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Dave
>
>
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support



Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Mon, Aug 25, 2008 at 1:07 PM, Dave Cramer <pg@fastcrypt.com> wrote:

On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276]  LOG:  autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526]  LOG:  autovacuum launcher started

What did you SIGHUP, the launcher or postmaster?  You need the latter.
The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any autovacuums still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407]  LOG:  received SIGHUP, reloading configuration files
2008-08-26 04:00:02 EDT [22649]  LOG:  autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438]  LOG:  autovacuum launcher started


you can see the SIGHUP, the launcher being shut down, and starting right back up again ???

is this expected behaviour ?

Dave

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Ok, here are the logs from last night
> 
> 2008-08-26 04:00:02 EDT [25407]  LOG:  received SIGHUP, reloading
> configuration files
> 2008-08-26 04:00:02 EDT [22649]  LOG:  autovacuum launcher shutting down
> 2008-08-26 04:00:02 EDT [30438]  LOG:  autovacuum launcher started
> 
> 
> you can see the SIGHUP, the launcher being shut down, and starting right
> back up again ???
> 
> is this expected behaviour ?

Certainly not, and that's not what I see here either.  I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:

> Ok, here are the logs from last night
>
> 2008-08-26 04:00:02 EDT [25407]  LOG:  received SIGHUP, reloading
> configuration files
> 2008-08-26 04:00:02 EDT [22649]  LOG:  autovacuum launcher shutting down
> 2008-08-26 04:00:02 EDT [30438]  LOG:  autovacuum launcher started
>
>
> you can see the SIGHUP, the launcher being shut down, and starting right
> back up again ???
>
> is this expected behaviour ?

Certainly not, and that's not what I see here either.  I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on, and it was not.

Dave


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:


On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:

> Ok, here are the logs from last night
>
> 2008-08-26 04:00:02 EDT [25407]  LOG:  received SIGHUP, reloading
> configuration files
> 2008-08-26 04:00:02 EDT [22649]  LOG:  autovacuum launcher shutting down
> 2008-08-26 04:00:02 EDT [30438]  LOG:  autovacuum launcher started
>
>
> you can see the SIGHUP, the launcher being shut down, and starting right
> back up again ???
>
> is this expected behaviour ?

Certainly not, and that's not what I see here either.  I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on, and it was not.

Dave

So where do we go from here ?


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:
> 
> > On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
> > alvherre@commandprompt.com> wrote:

> >> Certainly not, and that's not what I see here either.  I assume process
> >> 25407 is (was) the postmaster, yes?
> >>
> >> If you "show autovacuum", is it on?
> >
> > Yes that was the postmaster, and I did check to see if autovacuum was on,
> > and it was not.
> >
> So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues.  This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

varsup.c line 246   /*      * We'll start trying to force autovacuums when oldest_datfrozenxid gets    * to be more
thanautovacuum_freeze_max_age transactions old.    *          * Note: guc.c ensures that autovacuum_freeze_max_age is
ina sane range,    * so that xidVacLimit will be well before xidWarnLimit.    *      * [...]    */        xidVacLimit =
oldest_datfrozenxid+ autovacuum_freeze_max_age;
 
   ...
   if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&       IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);



However, I think that in allowed configurations you should also receive
these warnings:
   /* Give an immediate warning if past the wrap warn point */   if (TransactionIdFollowsOrEquals(curXid,
xidWarnLimit))      ereport(WARNING,          (errmsg("database \"%s\" must be vacuumed within %u transactions",
         NameStr(*oldest_datname),                  xidWrapLimit - curXid),           errhint("To avoid a database
shutdown,execute a full-database VACUUM in \"%s\".",                   NameStr(*oldest_datname))));
 


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:
>
> > On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
> > alvherre@commandprompt.com> wrote:

> >> Certainly not, and that's not what I see here either.  I assume process
> >> 25407 is (was) the postmaster, yes?
> >>
> >> If you "show autovacuum", is it on?
> >
> > Yes that was the postmaster, and I did check to see if autovacuum was on,
> > and it was not.
> >
> So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues.  This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

> > The only possible explanation for this behavior is that somebody is
> > signalling the postmaster due to Xid wraparound issues.  This is keyed
> > on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
> > insane value?
> 
> Doesn't appear to be insane ?
> 
> #autovacuum_freeze_max_age = 200000000  # maximum XID age before forced
> vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

> > The only possible explanation for this behavior is that somebody is
> > signalling the postmaster due to Xid wraparound issues.  This is keyed
> > on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
> > insane value?
>
> Doesn't appear to be insane ?
>
> #autovacuum_freeze_max_age = 200000000  # maximum XID age before forced
> vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
 datfrozenxid
--------------
    201850617
    101850961
     86039359
     21522712


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 11:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:


On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

> > The only possible explanation for this behavior is that somebody is
> > signalling the postmaster due to Xid wraparound issues.  This is keyed
> > on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
> > insane value?
>
> Doesn't appear to be insane ?
>
> #autovacuum_freeze_max_age = 200000000  # maximum XID age before forced
> vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
 datfrozenxid
--------------
    201850617
    101850961
     86039359
     21522712


this code in autovacuum.c looks like it might be interesting

                        if (AutoVacuumShmem->av_signal[AutoVacForkFailed])
                        {
                                /*
                                 * If the postmaster failed to start a new worker, we sleep
                                 * for a little while and resend the signal.  The new worker's
                                 * state is still in memory, so this is sufficient.  After
                                 * that, we restart the main loop.
                                 *
                                 * XXX should we put a limit to the number of times we retry?
                                 * I don't think it makes much sense, because a future start
                                 * of a worker will continue to fail in the same way.
                                 */
                                AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
                                pg_usleep(100000L);             /* 100ms */
                                SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
                                continue;

Do these signals get cleaned up on a reload ?

Dave

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com

> > What's the max age(pg_database.datfrozenxid)?
> 
> select datfrozenxid from pg_database ;
>  datfrozenxid
> --------------
>     201850617
>     101850961
>      86039359
>      21522712

Well, the first one is over the limit, isn't it?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:


> AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
>                                 pg_usleep(100000L);             /* 100ms */
> 
> SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
>                                 continue;
> 
> Do these signals get cleaned up on a reload ?

Well, not on a reload specifically, but this signal is of prompt
response (i.e. the postmaster acts immediately on it).  See
CheckPostmasterSignal.  Also, note that this code starts a worker, not
the launcher which is what you're seeing.

The signal you're looking for is PMSIGNAL_START_AUTOVAC_LAUNCHER (see
the varsup.c code in the vicinity of what I posted earlier).  The
postmaster response is to set start_autovac_launcher (see
sigusr1_handler in postmaster.c) and when this is seen set, the launcher
is started (see ServerLoop in postmaster.c).  However the bit you're
interested in is *why* the signal is being sent, which is what the
freeze limits determine.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:
<div dir="ltr"><br /><br /><div class="gmail_quote">On Tue, Aug 26, 2008 at 12:10 PM, Alvaro Herrera <span
dir="ltr"><<ahref="mailto:alvherre@commandprompt.com">alvherre@commandprompt.com</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex;
padding-left:1ex;"><div class="Ih2E3d">Dave Cramer wrote:<br /><br /><br /> >
AutoVacuumShmem->av_signal[AutoVacForkFailed]= false;<br /> >                                 pg_usleep(100000L);
           /* 100ms */<br /> ><br /> > SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);<br /> >          
                     continue;<br /> ><br /> > Do these signals get cleaned up on a reload ?<br /><br
/></div>Well,not on a reload specifically, but this signal is of prompt<br /> response (i.e. the postmaster acts
immediatelyon it).  See<br /> CheckPostmasterSignal.  Also, note that this code starts a worker, not<br /> the launcher
whichis what you're seeing.<br /><br /></blockquote></div><br />Ok, back to why<br /><br />turns out template0 is the
culprit,why is autovac not vacuuming this ?<br /><br />Dave<br /></div> 

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Ok, back to why
> 
> turns out template0 is the culprit, why is autovac not vacuuming this ?

Hmm ... template0 is not supposed to need vacuuming, because it is
frozen ... is it marked with datallowconn=false?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 12:21 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:

> Ok, back to why
>
> turns out template0 is the culprit, why is autovac not vacuuming this ?

Hmm ... template0 is not supposed to need vacuuming, because it is
frozen ... is it marked with datallowconn=false?

Yes

 select * from pg_database where datname='template0';
  datname  | datdba | encoding | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace | datconfig |               datacl               
-----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------
 template0 |     10 |        6 | t             | f            |           -1 |         11510 |    201850617 |          1663 |           | {=c/postgres,postgres=CTc/postgres}

So how to fix ?
 


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: can't stop autovacuum by HUP'ing the server

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Dave Cramer wrote:
>> turns out template0 is the culprit, why is autovac not vacuuming this ?

> Hmm ... template0 is not supposed to need vacuuming, because it is
> frozen ... is it marked with datallowconn=false?

8.3's autovac doesn't care about that, does it?

Seems like the next step is to enable logging of autovac's decision-making.
        regards, tom lane


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Yes
> 
>  select * from pg_database where datname='template0';
>   datname  | datdba | encoding | datistemplate | datallowconn | datconnlimit
> | datlastsysoid | datfrozenxid | dattablespace | datconfig |
> datacl
>
-----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------
>  template0 |     10 |        6 | t             | f            |           -1
> |         11510 |    201850617 |          1663 |           |
> {=c/postgres,postgres=CTc/postgres}
> 
> So how to fix ?

I think I see the problem -- vac_truncate_clog is not ignoring these
databases when passing the new frozen value to SetTransactionIdLimit.
   /*    * Scan pg_database to compute the minimum datfrozenxid    *    * Note: we need not worry about a race
conditionwith new entries being    * inserted by CREATE DATABASE.  Any such entry will have a copy of some    *
existingDB's datfrozenxid, and that source DB cannot be ours because    * of the interlock against copying a DB
containingan active backend.    * Hence the new entry will not reduce the minimum.  Also, if two VACUUMs    *
concurrentlymodify the datfrozenxid's of different databases, the    * worst possible outcome is that pg_clog is not
truncatedas aggressively    * as it could be.    */   relation = heap_open(DatabaseRelationId, AccessShareLock);
 
   scan = heap_beginscan(relation, SnapshotNow, 0, NULL);
   while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)   {       Form_pg_database dbform =
(Form_pg_database)GETSTRUCT(tuple);
 
       Assert(TransactionIdIsNormal(dbform->datfrozenxid));
       if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))           frozenAlreadyWrapped = true;       else if
(TransactionIdPrecedes(dbform->datfrozenxid,frozenXID))       {           frozenXID = dbform->datfrozenxid;
namecpy(&oldest_datname,&dbform->datname);       }   }
 
   ...
   /*    * Update the wrap limit for GetNewTransactionId.  Note: this function    * will also signal the postmaster for
an(other)autovac cycle if needed.    */   SetTransactionIdLimit(frozenXID, &oldest_datname);
 


If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database.  We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 12:50 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:

> Yes
>
>  select * from pg_database where datname='template0';
>   datname  | datdba | encoding | datistemplate | datallowconn | datconnlimit
> | datlastsysoid | datfrozenxid | dattablespace | datconfig |
> datacl
> -----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------
>  template0 |     10 |        6 | t             | f            |           -1
> |         11510 |    201850617 |          1663 |           |
> {=c/postgres,postgres=CTc/postgres}
>
> So how to fix ?

I think I see the problem -- vac_truncate_clog is not ignoring these
databases when passing the new frozen value to SetTransactionIdLimit.

   /*
    * Scan pg_database to compute the minimum datfrozenxid
    *
    * Note: we need not worry about a race condition with new entries being
    * inserted by CREATE DATABASE.  Any such entry will have a copy of some
    * existing DB's datfrozenxid, and that source DB cannot be ours because
    * of the interlock against copying a DB containing an active backend.
    * Hence the new entry will not reduce the minimum.  Also, if two VACUUMs
    * concurrently modify the datfrozenxid's of different databases, the
    * worst possible outcome is that pg_clog is not truncated as aggressively
    * as it could be.
    */
   relation = heap_open(DatabaseRelationId, AccessShareLock);

   scan = heap_beginscan(relation, SnapshotNow, 0, NULL);

   while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
   {
       Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);

       Assert(TransactionIdIsNormal(dbform->datfrozenxid));

       if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
           frozenAlreadyWrapped = true;
       else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
       {
           frozenXID = dbform->datfrozenxid;
           namecpy(&oldest_datname, &dbform->datname);
       }
   }

   ...

   /*
    * Update the wrap limit for GetNewTransactionId.  Note: this function
    * will also signal the postmaster for an(other) autovac cycle if needed.
    */
   SetTransactionIdLimit(frozenXID, &oldest_datname);


If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database.  We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

Well, I'm willing to help debug this, however this is a busy production database and I need to be able to turn it off for a few hours a day. Would changing autovacuum_freeze_max_age be a solution ?

Dave

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: can't stop autovacuum by HUP'ing the server

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> If it doesn't ignore them, then it should be properly vacuuming
> template0 as any other database.  We've changed autovac's behavior on
> this area back and forth so I may be misremembering what's our rationale
> du jour.

AFAICS, the only way in which current autovac treats !datallowconn
databases specially is this test in do_autovacuum:
if (dbForm->datistemplate || !dbForm->datallowconn)    default_freeze_min_age = 0;else    default_freeze_min_age =
vacuum_freeze_min_age;

Perhaps there's something wrong with the idea of setting freeze_min_age
to zero?
        regards, tom lane


Re: can't stop autovacuum by HUP'ing the server

From
Joshua Drake
Date:
On Tue, 26 Aug 2008 12:58:59 -0400
"Dave Cramer" <pg@fastcrypt.com> wrote:

> 
> Well, I'm willing to help debug this, however this is a busy
> production database and I need to be able to turn it off for a few
> hours a day. Would changing autovacuum_freeze_max_age be a solution ?

Populate the table pg_autovacuum with all your relations and the
defaults from the postgresql.conf. Then set enabled to FALSE on all the
tuples. When you are ready to turn autovacuum back on, set it to TRUE.

Joshua D. Drake



-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:

> Well, I'm willing to help debug this, however this is a busy production
> database and I need to be able to turn it off for a few hours a day. Would
> changing autovacuum_freeze_max_age be a solution ?

Yes.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > If it doesn't ignore them, then it should be properly vacuuming
> > template0 as any other database.  We've changed autovac's behavior on
> > this area back and forth so I may be misremembering what's our rationale
> > du jour.
> 
> AFAICS, the only way in which current autovac treats !datallowconn
> databases specially is this test in do_autovacuum:
> 
>     if (dbForm->datistemplate || !dbForm->datallowconn)
>         default_freeze_min_age = 0;
>     else
>         default_freeze_min_age = vacuum_freeze_min_age;
> 
> Perhaps there's something wrong with the idea of setting freeze_min_age
> to zero?

Nope, AFAICS it's harmless; what it means is that on those databases,
all tuples will be frozen immediately.

I'll try to reproduce the problem here.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: can't stop autovacuum by HUP'ing the server

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Dave Cramer wrote:
>> Well, I'm willing to help debug this, however this is a busy production
>> database and I need to be able to turn it off for a few hours a day. Would
>> changing autovacuum_freeze_max_age be a solution ?

> Yes.

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you.
        regards, tom lane


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Dave Cramer wrote:
>> Well, I'm willing to help debug this, however this is a busy production
>> database and I need to be able to turn it off for a few hours a day. Would
>> changing autovacuum_freeze_max_age be a solution ?

> Yes.

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you. 

I actually set that last time I restarted. There's nothing particularly interesting there. Is there another log GUC that needs to be tweaked to get more output ?


Dave

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > Could we first see a cycle of autovac log output with
> > log_autovacuum_min_duration = 0?
> > Otherwise we're not going to get closer to understanding why it's
> > not cleaning up template0 for you.
> 
> I actually set that last time I restarted. There's nothing particularly
> interesting there. Is there another log GUC that needs to be tweaked to get
> more output ?

My guess is that autovacuum is skipping the database for some reason, so
there's no log entry at all.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
"Dave Cramer"
Date:


On Tue, Aug 26, 2008 at 1:45 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > Could we first see a cycle of autovac log output with
> > log_autovacuum_min_duration = 0?
> > Otherwise we're not going to get closer to understanding why it's
> > not cleaning up template0 for you.
>
> I actually set that last time I restarted. There's nothing particularly
> interesting there. Is there another log GUC that needs to be tweaked to get
> more output ?

My guess is that autovacuum is skipping the database for some reason, so
there's no log entry at all.

Seems like a viable explanation, but doesn't advance us any further  ?

Dave

Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Dave Cramer wrote:
> On Tue, Aug 26, 2008 at 1:45 PM, Alvaro Herrera
> <alvherre@commandprompt.com>wrote:

> > My guess is that autovacuum is skipping the database for some reason, so
> > there's no log entry at all.
> 
> Seems like a viable explanation, but doesn't advance us any further  ?

Nope, it doesn't -- we need to understand what's the reason.  I'm
checking the code.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: can't stop autovacuum by HUP'ing the server

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Nope, AFAICS it's harmless; what it means is that on those databases,
> all tuples will be frozen immediately.
> 
> I'll try to reproduce the problem here.

No luck :-(  It works as expected for me.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.