Re: [PATCHES] Infrastructure changes for recovery (v8) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PATCHES] Infrastructure changes for recovery (v8)
Date
Msg-id 1223468660.4747.294.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: [PATCHES] Infrastructure changes for recovery (v8)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: [PATCHES] Infrastructure changes for recovery (v8)
List pgsql-hackers
On Wed, 2008-10-08 at 14:43 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * optional recovery_safe_start_location parameter now provided in
> > recovery.conf, to allow a consistency point to be manually defined if a
> > base backup was not taken using standard pg_start/stop backup functions
> 
> Do we want to cater for that? It only seems safe if you have 
> full_page_writes turned on, and you perform a checkpoint first. But if 
> you do that, why don't you just use pg_start_backup()?

I'm easy on that one. It is a supported backup method, so without this
it would not be possible to utilise Hot Standby in conjunction with this
backup technique. Not many people use it, but I guess some do.

> > Other Changes
> > * log_restartpoints removed, use log_checkpoints in postgresql.conf
> 
> Is this something that would make sense regardless of the rest of the 
> patch? If so, we could apply that separately, which would make this 
> patch a little less overwhelming to review.

Maybe, it's fairly minor.

> > * additional function signature for pg_start_backup('label', true |
> > false) to allow definition of immediate checkpoint/not
> 
> Wouldn't this need a new entry in pg_proc.h? Again, perhaps we should do 
> this as a separate patch.

That's concerning. I remember adding the entry and assigning a new oid,
but it isn't in the patch. The multi-argument version was definitely
tested, that's how I discovered the bug also fixed in the patch.

> > * fixes bug discovered while other testing: if pg_stop_backup() is run
> > when xlogswitch has just occurred then we do not switch log files, yet
> > we return current filename even though nothing of value in it. If
> > archive_timeout not enabled we would wait forever for pg_stop_backup()
> > to return. 

OK, I'll strip all of the above out, for separate consideration.

> > * Substantial comments throughout
> 
> These comments on CheckPointLock seem contradictory:
> 
> > --- 247,256 ----
> >    * ControlFileLock: must be held to read/update control file or create
> >    * new log file.
> >    *
> > !  * CheckpointLock: must be held to do a checkpoint or restartpoint, ensuring
> > !  * we get just one of those at any time. In 8.4+ recovery, both startup and
> > !  * bgwriter processes may take restartpoints, so this locking must be strict 
> > !  * to ensure there are no mistakes.
> >    *
> >    *----------
> >    */
> 
> and
> 
> > --- 5901,5916 ----
> >       XLogRecPtr    recptr;
> >       XLogCtlInsert *Insert = &XLogCtl->Insert;
> >       XLogRecData rdata;
> >       uint32        _logId;
> >       uint32        _logSeg;
> >       TransactionId *inCommitXids;
> >       int            nInCommit;
> > +     bool        leavingArchiveRecovery = false;
> >   
> >       /*
> >        * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
> > !      * That shouldn't be happening, but checkpoints are an important aspect
> > !      * of our resilience, so we take no chances.
> >        */
> >       LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
> >   
> 
> If I've understood the patch correctly, only bgwriter does checkpoints 
> and restart points now?

Tom requested that we retain the ability for Startup process to perform
restartpoints up until the point that bgwriter spawns, then after that
bgwriter performs them.

The form is this

PM_START    startup process performs restartpoints    transition when database is consistent state
PM_RECOVERY    bgwriter process performs restartpoints    delicate transition between two states
PM_RUN        bgwriter process performs checkpoints

> There's a trivial merge conflict in bgwriter.c, due to the FSM patch.

OK, will look.

Thanks for looking.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reducing some DDL Locks to ShareLock
Next
From: Markus Wanner
Date:
Subject: Re: problem with compilation on fedora core 10 64 bit