Re: Recovery of Multi-stage WAL actions - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Recovery of Multi-stage WAL actions |
Date | |
Msg-id | 200803250029.m2P0TaH27926@momjian.us Whole thread Raw |
In response to | Recovery of Multi-stage WAL actions (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-hackers |
Added to TODO: > * Have resource managers report the duration of their status changes > > http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php --------------------------------------------------------------------------- Simon Riggs wrote: > We've had two hard to diagnose errors in recovery in recent months. ISTM > that the core issue is the way we allow Resource Managers to have > multi-stage WAL actions that persist for long periods of time. This > means we have no way of telling whether the answer > rm_safe_restartpoint() == false is a momentary, valid state or a > progressively worsening indicator of a subtle RM bug. > > An example of a multi-stage WAL action would be an index split inside > one of the Resource Managers. Now that kind of action shouldn't take > very long, though theoretically it could for various reasons. > > Right now we have a log message to cope with this: > http://archives.postgresql.org/pgsql-committers/2007-08/msg00374.php > but its not nearly as helpful as we'd like it to be. We could back-patch > this to 8.2, but I have a potentially better proposal. > > I very much want to encourage authors of new Resource Managers and it > looks like we may be getting at least 3 new RMs that produce WAL > records: hash indexes (currently not WAL-logged), bitmap indexes and > clustered indexes for 8.4. We should be realistic that new bugs probably > will occur in recovery code for existing and new RMs. > > What I'd like to do is force all of the RMs to record the lsn of any WAL > record that starts an incomplete action. Then, if an incomplete action > lives for more than a certain period of time it will be possible to > produce a log message saying "incomplete split has survived for X > seconds, in xlog time". That way we'll see log messages if any of the > RMs start to push an incomplete action onto their list and then not > consume it again. > > We might trust each RM to implement code to LOG messages if their code > goes a little awry, but I'd prefer some mechanism that allows the main > server to check what's happening in each RM. That way we'd have a > cross-check on whether the RM is well-behaved, plus we'd only need to > implement the checking code once. Right now very similar, yet different > code runs inside each RM. > > So my proposal is to have an incomplete split remember/forget API that > forces each RM to expose its incomplete split List. Currently each RM > has a hook on rm_saferestartpoint(), so that each RM manages its own > List. My new thought is to have one function safeRestartPoint() that > inspects each of the incomplete split Lists to see if they are empty. If > the lists are non-empty then inspect the age of each list entry to see > if it is worth reporting as a possible issue. Each RM would then store > incomplete splits using a ResourceManagerRememberIncompleteEvent(lsn, > id_data, payload??) and ResourceManagerforgetIncompleteEvent(id_data). > Implementation is a a bit hazy on that last part, but I think the > overall idea is clear. > > That should mean that any incomplete split that lasts for the length of > one restartpoint, which is *at least* one checkpoint duration, should > cause a LOG message to be produced. We might even go as far as to ignore > super long-lived and therefore spurious incomplete splits when we issue > rm_cleanup() for fear of allowing RM bugs to kill recovery. > > I'd like to suggest that those changes be performed now for 8.3 *and* > back-patched for 8.2. I want to make sure that all users are able to > diagnose server errors and report them. I'm guessing that might raise a > few eyebrows, but I think its justifiable. Bugs in complex code are > inevitable and should not be seen to reflect badly upon RM authors. > However, our inability to recognise RM bugs that do occur doesn't seem > acceptable to me, especially since they may save themselves up for the > moment of PITR fail-over. You might persuade me I'm being over-zealous > here, but High Availability is something we have to be zealous about. > > It should also be possible to allow the server to stay up even if one of > the RM's fails to recover properly. That would need to be settable, so I > really only mean that for "optional" RMs, i.e. index RMs only. For those > cases we should be able to mark effected indexes by marking them > corrupt. Automatic rebuild of corrupt indexes could also be possible, > should it occur. That would be an 8.4 action... :-) > > Comments appreciated, as ever. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: