Re: Allow WAL information to recover corrupted pg_controldata - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Allow WAL information to recover corrupted pg_controldata
Date
Msg-id 002701cd4a95$fd9b0530$f8d10f90$@kapila@huawei.com
Whole thread Raw
In response to Re: Allow WAL information to recover corrupted pg_controldata  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Allow WAL information to recover corrupted pg_controldata
List pgsql-hackers
> I guess my first question is: why do we need this?  There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important.  Do you have reason to believe
> that this one is?  It's been six years since that email, so it's worth
> asking if this is actually relevant.

As far as I know the pg_control is not WAL protected, which means if it gets
corrupt due
to any reason (disk crash during flush, so written partially), it might lead
to failure in recovery of database.
So user can use pg_resetxlog to recover the database. Currently pg_resetxlog
works on guessed values for pg_control.
However this implementation can improve the logic that instead of guessing,
it can try to regenerate the values from
WAL.
This implementation can allow better recovery in certain circumstances.

> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.

Oh, I am sorry, as this was my first time I was not fully aware of the
deadline.

However I still seek your opinion whether it makes sense to work on this
feature.


-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, June 15, 2012 12:40 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Allow WAL information to recover corrupted
pg_controldata

On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com>
wrote:
> I am planning to work on the below Todo list item for this CommitFest
> Allow WAL information to recover corrupted pg_controldata
> http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

The deadline for patches for this CommitFest is today, so I think you
should target any work you're starting now for the NEXT CommitFest.

> I wanted to confirm my understanding about the work involved for this
patch:
> The existing patch has following set of problems:
>    1. Memory leak and linked list code path is not proper
>    2. lock check for if the server is already running, is removed in patch
> which needs to be reverted
>    3. Refactoring of the code.
>
> Apart from above what I understood from the patch is that its intention is
> to generate values for ControlFile using WAL logs when -r option is used.
>
> The change in algorithm from current will be if control file is corrupt
> which essentialy means ReadControlFile() will return False, then it should
> generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
> WAL if -r option is enabled.
>
> Also for -r option, it doesn't need to call function FindEndOfXLOG() as
the
> that work will be achieved by above point.
>
> It will just rewrite the control file and don’t do other resets.
>
>
> The algorithm of restoring the pg_control value from old xlog file:
>    1. Retrieve all of the active xlog files from xlog direcotry into a
list
> by increasing order, according their timeline, log id, segment id.
>    2. Search the list to find the oldest xlog file of the lastest time
line.
>    3. Search the records from the oldest xlog file of latest time line to
> the latest xlog file of latest time line, if the checkpoint record
>       has been found, update the latest checkpoint and previous
checkpoint.
>
>
>
> Apart from above some changes in code will be required after the Xlog
patch
> by Heikki.
>
> Suggest me if my understanding is correct?

I guess my first question is: why do we need this?  There are lots of
things in the TODO list that someone wanted once upon a time, but
they're not all actually important.  Do you have reason to believe
that this one is?  It's been six years since that email, so it's worth
asking if this is actually relevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WIP: relation metapages
Next
From: Bruce Momjian
Date:
Subject: Re: libpq compression