Re: Allow WAL information to recover corrupted pg_controldata - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: Allow WAL information to recover corrupted pg_controldata
Date
Msg-id 201206152249.50960.cedric@2ndquadrant.com
Whole thread Raw
In response to Re: Allow WAL information to recover corrupted pg_controldata  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Allow WAL information to recover corrupted pg_controldata
List pgsql-hackers
Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
> > I guess my first question is: why do we need this?  There are lots of
> > things in the TODO list that someone wanted once upon a time, but
> > they're not all actually important.  Do you have reason to believe
> > that this one is?  It's been six years since that email, so it's worth
> > asking if this is actually relevant.
>
> As far as I know the pg_control is not WAL protected, which means if it
> gets corrupt due
> to any reason (disk crash during flush, so written partially), it might
> lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

> So user can use pg_resetxlog to recover the database. Currently
> pg_resetxlog works on guessed values for pg_control.
> However this implementation can improve the logic that instead of guessing,
> it can try to regenerate the values from
> WAL.
> This implementation can allow better recovery in certain circumstances.
>
> > The deadline for patches for this CommitFest is today, so I think you
> > should target any work you're starting now for the NEXT CommitFest.
>
> Oh, I am sorry, as this was my first time I was not fully aware of the
> deadline.
>
> However I still seek your opinion whether it makes sense to work on this
> feature.
>
>
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Friday, June 15, 2012 12:40 AM
> To: Amit Kapila
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Allow WAL information to recover corrupted
> pg_controldata
>
> On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com>
>
> wrote:
> > I am planning to work on the below Todo list item for this CommitFest
> > Allow WAL information to recover corrupted pg_controldata
> > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php
>
> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.
>
> > I wanted to confirm my understanding about the work involved for this
>
> patch:
> > The existing patch has following set of problems:
> >    1. Memory leak and linked list code path is not proper
> >    2. lock check for if the server is already running, is removed in
> > patch which needs to be reverted
> >    3. Refactoring of the code.
> >
> > Apart from above what I understood from the patch is that its intention
> > is to generate values for ControlFile using WAL logs when -r option is
> > used.
> >
> > The change in algorithm from current will be if control file is corrupt
> > which essentialy means ReadControlFile() will return False, then it
> > should generate values (checkPointCopy, checkPoint, prevCheckPoint,
> > state) using WAL if -r option is enabled.
> >
> > Also for -r option, it doesn't need to call function FindEndOfXLOG() as
>
> the
>
> > that work will be achieved by above point.
> >
> > It will just rewrite the control file and don’t do other resets.
> >
> >
> > The algorithm of restoring the pg_control value from old xlog file:
> >    1. Retrieve all of the active xlog files from xlog direcotry into a
>
> list
>
> > by increasing order, according their timeline, log id, segment id.
> >    2. Search the list to find the oldest xlog file of the lastest time
>
> line.
>
> >    3. Search the records from the oldest xlog file of latest time line to
> > the latest xlog file of latest time line, if the checkpoint record
> >       has been found, update the latest checkpoint and previous
>
> checkpoint.
>
> > Apart from above some changes in code will be required after the Xlog
>
> patch
>
> > by Heikki.
> >
> > Suggest me if my understanding is correct?
>
> I guess my first question is: why do we need this?  There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important.  Do you have reason to believe
> that this one is?  It's been six years since that email, so it's worth
> asking if this is actually relevant.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Backup docs
Next
From: Peter Geoghegan
Date:
Subject: Re: sortsupport for text