Home > mailing lists

Re: Recovery of Multi-stage WAL actions - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Recovery of Multi-stage WAL actions
Date	March 25, 2008 00:29:46
Msg-id	200803250029.m2P0TaH27926@momjian.us Whole thread Raw
In response to	Recovery of Multi-stage WAL actions (Simon Riggs <simon@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Added to TODO:

> * Have resource managers report the duration of their status changes
>
>   http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> We've had two hard to diagnose errors in recovery in recent months. ISTM
> that the core issue is the way we allow Resource Managers to have
> multi-stage WAL actions that persist for long periods of time. This
> means we have no way of telling whether the answer
> rm_safe_restartpoint() == false is a momentary, valid state or a
> progressively worsening indicator of a subtle RM bug.
> 
> An example of a multi-stage WAL action would be an index split inside
> one of the Resource Managers. Now that kind of action shouldn't take
> very long, though theoretically it could for various reasons.
> 
> Right now we have a log message to cope with this:
> http://archives.postgresql.org/pgsql-committers/2007-08/msg00374.php
> but its not nearly as helpful as we'd like it to be. We could back-patch
> this to 8.2, but I have a potentially better proposal.
> 
> I very much want to encourage authors of new Resource Managers and it
> looks like we may be getting at least 3 new RMs that produce WAL
> records: hash indexes (currently not WAL-logged), bitmap indexes and
> clustered indexes for 8.4. We should be realistic that new bugs probably
> will occur in recovery code for existing and new RMs.
> 
> What I'd like to do is force all of the RMs to record the lsn of any WAL
> record that starts an incomplete action. Then, if an incomplete action
> lives for more than a certain period of time it will be possible to
> produce a log message saying "incomplete split has survived for X
> seconds, in xlog time". That way we'll see log messages if any of the
> RMs start to push an incomplete action onto their list and then not
> consume it again.
> 
> We might trust each RM to implement code to LOG messages if their code
> goes a little awry, but I'd prefer some mechanism that allows the main
> server to check what's happening in each RM. That way we'd have a
> cross-check on whether the RM is well-behaved, plus we'd only need to
> implement the checking code once. Right now very similar, yet different
> code runs inside each RM.
> 
> So my proposal is to have an incomplete split remember/forget API that
> forces each RM to expose its incomplete split List. Currently each RM
> has a hook on rm_saferestartpoint(), so that each RM manages its own
> List. My new thought is to have one function safeRestartPoint() that
> inspects each of the incomplete split Lists to see if they are empty. If
> the lists are non-empty then inspect the age of each list entry to see
> if it is worth reporting as a possible issue. Each RM would then store
> incomplete splits using a ResourceManagerRememberIncompleteEvent(lsn,
> id_data, payload??) and ResourceManagerforgetIncompleteEvent(id_data).
> Implementation is a a bit hazy on that last part, but I think the
> overall idea is clear.
> 
> That should mean that any incomplete split that lasts for the length of
> one restartpoint, which is *at least* one checkpoint duration, should
> cause a LOG message to be produced. We might even go as far as to ignore
> super long-lived and therefore spurious incomplete splits when we issue
> rm_cleanup() for fear of allowing RM bugs to kill recovery.
> 
> I'd like to suggest that those changes be performed now for 8.3 *and*
> back-patched for 8.2. I want to make sure that all users are able to
> diagnose server errors and report them. I'm guessing that might raise a
> few eyebrows, but I think its justifiable. Bugs in complex code are
> inevitable and should not be seen to reflect badly upon RM authors.
> However, our inability to recognise RM bugs that do occur doesn't seem
> acceptable to me, especially since they may save themselves up for the
> moment of PITR fail-over. You might persuade me I'm being over-zealous
> here, but High Availability is something we have to be zealous about.
> 
> It should also be possible to allow the server to stay up even if one of
> the RM's fails to recover properly. That would need to be settable, so I
> really only mean that for "optional" RMs, i.e. index RMs only. For those
> cases we should be able to mark effected indexes by marking them
> corrupt. Automatic rebuild of corrupt indexes could also be possible,
> should it occur. That would be an 8.4 action... :-)
> 
> Comments appreciated, as ever.
> 
> -- 
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: Bruce Momjian
Date: 25 March 2008, 00:25:44
Subject: Re: [pgsql-www] New email list for emergency communications

From: "Stephen Denne"
Date: 25 March 2008, 00:42:29
Subject: Re: count(*) performance improvement ideas

Re: Recovery of Multi-stage WAL actions - Mailing list pgsql-hackers

Previous

Next