Thread: Allow WAL information to recover corrupted pg_controldata

Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

14 June 2012, 12:40:26

<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">I am
planningto work on the below Todo list item for this CommitFest</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">AllowWAL information to recover corrupted pg_controldata
</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif""><a
href="http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php">http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php</a></span><br
/><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif"">   </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Iwanted to confirm my understanding about the work involved
forthis patch:</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">The existing patch has
followingset of problems:</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">   1. Memory leak
andlinked list code path is not proper</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif""> 
 2.lock check for if the server is already running, is removed in patch which needs to be reverted</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">  3. Refactoring of the code.</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">  </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Apartfrom above what I understood from the patch is that its
intentionis to generate values for ControlFile using WAL logs when -r option is used.</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">  </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Thechange in algorithm from current will be if control file
iscorrupt which</span> <span style="font-size:10.0pt;font-family:"Arial","sans-serif"">essentialy means
ReadControlFile()will return False, then it should</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">generatevalues (checkPointCopy, checkPoint, prevCheckPoint,
state)using WAL if -r option is enabled.</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Alsofor -r option, it doesn't need to call function
FindEndOfXLOG()as the that work will be achieved</span> <span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">byabove point. </span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Itwill just rewrite the control file and don’t do other
resets.</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">   </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> </span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Thealgorithm of restoring the pg_control value from old xlog
file:</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">   1. Retrieve all of the active xlog
filesfrom xlog direcotry into a list by increasing order, according their timeline, log id, segment id.</span><br
/><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif"">   2. Search the list to find the oldest xlog file of
thelastest time line.</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">   3. Search the
recordsfrom the oldest xlog file of latest time line</span> <span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">tothe latest xlog file of latest time line, if the checkpoint
record</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">      has been found, update the
latestcheckpoint and previous checkpoint.</span><p class="MsoNormal"> <p class="MsoNormal">Apart from above some
changesin code will be required after the Xlog patch by Heikki.<br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">     </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Suggestme if my understanding is correct?</span><p
class="MsoNormal"> <pclass="MsoNormal"> <p class="MsoNormal"> </div>

Re: Allow WAL information to recover corrupted pg_controldata

From

Robert Haas

Date:

14 June 2012, 16:10:20

On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> I am planning to work on the below Todo list item for this CommitFest
> Allow WAL information to recover corrupted pg_controldata
> http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

The deadline for patches for this CommitFest is today, so I think you
should target any work you're starting now for the NEXT CommitFest.

> I wanted to confirm my understanding about the work involved for this patch:
> The existing patch has following set of problems:
>    1. Memory leak and linked list code path is not proper
>    2. lock check for if the server is already running, is removed in patch
> which needs to be reverted
>    3. Refactoring of the code.
>
> Apart from above what I understood from the patch is that its intention is
> to generate values for ControlFile using WAL logs when -r option is used.
>
> The change in algorithm from current will be if control file is corrupt
> which essentialy means ReadControlFile() will return False, then it should
> generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
> WAL if -r option is enabled.
>
> Also for -r option, it doesn't need to call function FindEndOfXLOG() as the
> that work will be achieved by above point.
>
> It will just rewrite the control file and don’t do other resets.
>
>
> The algorithm of restoring the pg_control value from old xlog file:
>    1. Retrieve all of the active xlog files from xlog direcotry into a list
> by increasing order, according their timeline, log id, segment id.
>    2. Search the list to find the oldest xlog file of the lastest time line.
>    3. Search the records from the oldest xlog file of latest time line to
> the latest xlog file of latest time line, if the checkpoint record
>       has been found, update the latest checkpoint and previous checkpoint.
>
>
>
> Apart from above some changes in code will be required after the Xlog patch
> by Heikki.
>
> Suggest me if my understanding is correct?

I guess my first question is: why do we need this?  There are lots of
things in the TODO list that someone wanted once upon a time, but
they're not all actually important.  Do you have reason to believe
that this one is?  It's been six years since that email, so it's worth
asking if this is actually relevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

14 June 2012, 22:27:39

> I guess my first question is: why do we need this?  There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important.  Do you have reason to believe
> that this one is?  It's been six years since that email, so it's worth
> asking if this is actually relevant.

As far as I know the pg_control is not WAL protected, which means if it gets
corrupt due
to any reason (disk crash during flush, so written partially), it might lead
to failure in recovery of database.
So user can use pg_resetxlog to recover the database. Currently pg_resetxlog
works on guessed values for pg_control.
However this implementation can improve the logic that instead of guessing,
it can try to regenerate the values from
WAL.
This implementation can allow better recovery in certain circumstances.

> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.

Oh, I am sorry, as this was my first time I was not fully aware of the
deadline.

However I still seek your opinion whether it makes sense to work on this
feature.


-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, June 15, 2012 12:40 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Allow WAL information to recover corrupted
pg_controldata

On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com>
wrote:
> I am planning to work on the below Todo list item for this CommitFest
> Allow WAL information to recover corrupted pg_controldata
> http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

The deadline for patches for this CommitFest is today, so I think you
should target any work you're starting now for the NEXT CommitFest.

> I wanted to confirm my understanding about the work involved for this
patch:
> The existing patch has following set of problems:
>    1. Memory leak and linked list code path is not proper
>    2. lock check for if the server is already running, is removed in patch
> which needs to be reverted
>    3. Refactoring of the code.
>
> Apart from above what I understood from the patch is that its intention is
> to generate values for ControlFile using WAL logs when -r option is used.
>
> The change in algorithm from current will be if control file is corrupt
> which essentialy means ReadControlFile() will return False, then it should
> generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
> WAL if -r option is enabled.
>
> Also for -r option, it doesn't need to call function FindEndOfXLOG() as
the
> that work will be achieved by above point.
>
> It will just rewrite the control file and dont do other resets.
>
>
> The algorithm of restoring the pg_control value from old xlog file:
>    1. Retrieve all of the active xlog files from xlog direcotry into a
list
> by increasing order, according their timeline, log id, segment id.
>    2. Search the list to find the oldest xlog file of the lastest time
line.
>    3. Search the records from the oldest xlog file of latest time line to
> the latest xlog file of latest time line, if the checkpoint record
>       has been found, update the latest checkpoint and previous
checkpoint.
>
>
>
> Apart from above some changes in code will be required after the Xlog
patch
> by Heikki.
>
> Suggest me if my understanding is correct?

I guess my first question is: why do we need this?  There are lots of
things in the TODO list that someone wanted once upon a time, but
they're not all actually important.  Do you have reason to believe
that this one is?  It's been six years since that email, so it's worth
asking if this is actually relevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow WAL information to recover corrupted pg_controldata

From

Cédric Villemain

Date:

15 June 2012, 17:50:18

Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
> > I guess my first question is: why do we need this?  There are lots of
> > things in the TODO list that someone wanted once upon a time, but
> > they're not all actually important.  Do you have reason to believe
> > that this one is?  It's been six years since that email, so it's worth
> > asking if this is actually relevant.
>
> As far as I know the pg_control is not WAL protected, which means if it
> gets corrupt due
> to any reason (disk crash during flush, so written partially), it might
> lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

> So user can use pg_resetxlog to recover the database. Currently
> pg_resetxlog works on guessed values for pg_control.
> However this implementation can improve the logic that instead of guessing,
> it can try to regenerate the values from
> WAL.
> This implementation can allow better recovery in certain circumstances.
>
> > The deadline for patches for this CommitFest is today, so I think you
> > should target any work you're starting now for the NEXT CommitFest.
>
> Oh, I am sorry, as this was my first time I was not fully aware of the
> deadline.
>
> However I still seek your opinion whether it makes sense to work on this
> feature.
>
>
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Friday, June 15, 2012 12:40 AM
> To: Amit Kapila
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Allow WAL information to recover corrupted
> pg_controldata
>
> On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com>
>
> wrote:
> > I am planning to work on the below Todo list item for this CommitFest
> > Allow WAL information to recover corrupted pg_controldata
> > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php
>
> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.
>
> > I wanted to confirm my understanding about the work involved for this
>
> patch:
> > The existing patch has following set of problems:
> >    1. Memory leak and linked list code path is not proper
> >    2. lock check for if the server is already running, is removed in
> > patch which needs to be reverted
> >    3. Refactoring of the code.
> >
> > Apart from above what I understood from the patch is that its intention
> > is to generate values for ControlFile using WAL logs when -r option is
> > used.
> >
> > The change in algorithm from current will be if control file is corrupt
> > which essentialy means ReadControlFile() will return False, then it
> > should generate values (checkPointCopy, checkPoint, prevCheckPoint,
> > state) using WAL if -r option is enabled.
> >
> > Also for -r option, it doesn't need to call function FindEndOfXLOG() as
>
> the
>
> > that work will be achieved by above point.
> >
> > It will just rewrite the control file and dont do other resets.
> >
> >
> > The algorithm of restoring the pg_control value from old xlog file:
> >    1. Retrieve all of the active xlog files from xlog direcotry into a
>
> list
>
> > by increasing order, according their timeline, log id, segment id.
> >    2. Search the list to find the oldest xlog file of the lastest time
>
> line.
>
> >    3. Search the records from the oldest xlog file of latest time line to
> > the latest xlog file of latest time line, if the checkpoint record
> >       has been found, update the latest checkpoint and previous
>
> checkpoint.
>
> > Apart from above some changes in code will be required after the Xlog
>
> patch
>
> > by Heikki.
> >
> > Suggest me if my understanding is correct?
>
> I guess my first question is: why do we need this?  There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important.  Do you have reason to believe
> that this one is?  It's been six years since that email, so it's worth
> asking if this is actually relevant.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit kapila

Date:

16 June 2012, 03:01:26

> > > I guess my first question is: why do we need this?  There are lots of
> > > things in the TODO list that someone wanted once upon a time, but
> > > they're not all actually important.  Do you have reason to believe
> > > that this one is?  It's been six years since that email, so it's worth
> > > asking if this is actually relevant.
>
>> As far as I know the pg_control is not WAL protected, which means if it
>> gets corrupt due
>> to any reason (disk crash during flush, so written partially), it might
>> lead to failure in recovery of database.

> AFAIR pg_controldata fit on a disk sector so it can not be half written.  It can be corrupt due to some other reasons
aswell like torn disk sector. As already pg_resetxlog has a mechanism to recover corrupt pg_control file, so it is
alreadyconsidered that it can be corrupt in some case.The suggested patch improves the logic to recover corrupt control
file.So that is the reason I felt it will be relevant to do this patch. 
________________________________________
From: Cédric Villemain [cedric@2ndquadrant.com]
Sent: Saturday, June 16, 2012 2:19 AM
To: pgsql-hackers@postgresql.org
Cc: Amit kapila; 'Robert Haas'
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
> > I guess my first question is: why do we need this?  There are lots of
> > things in the TODO list that someone wanted once upon a time, but
> > they're not all actually important.  Do you have reason to believe
> > that this one is?  It's been six years since that email, so it's worth
> > asking if this is actually relevant.
>
> As far as I know the pg_control is not WAL protected, which means if it
> gets corrupt due
> to any reason (disk crash during flush, so written partially), it might
> lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

> So user can use pg_resetxlog to recover the database. Currently
> pg_resetxlog works on guessed values for pg_control.
> However this implementation can improve the logic that instead of guessing,
> it can try to regenerate the values from
> WAL.
> This implementation can allow better recovery in certain circumstances.
>
> > The deadline for patches for this CommitFest is today, so I think you
> > should target any work you're starting now for the NEXT CommitFest.
>
> Oh, I am sorry, as this was my first time I was not fully aware of the
> deadline.
>
> However I still seek your opinion whether it makes sense to work on this
> feature.
>
>
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Friday, June 15, 2012 12:40 AM
> To: Amit Kapila
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Allow WAL information to recover corrupted
> pg_controldata
>
> On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit.kapila@huawei.com>
>
> wrote:
> > I am planning to work on the below Todo list item for this CommitFest
> > Allow WAL information to recover corrupted pg_controldata
> > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php
>
> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.
>
> > I wanted to confirm my understanding about the work involved for this
>
> patch:
> > The existing patch has following set of problems:
> >    1. Memory leak and linked list code path is not proper
> >    2. lock check for if the server is already running, is removed in
> > patch which needs to be reverted
> >    3. Refactoring of the code.
> >
> > Apart from above what I understood from the patch is that its intention
> > is to generate values for ControlFile using WAL logs when -r option is
> > used.
> >
> > The change in algorithm from current will be if control file is corrupt
> > which essentialy means ReadControlFile() will return False, then it
> > should generate values (checkPointCopy, checkPoint, prevCheckPoint,
> > state) using WAL if -r option is enabled.
> >
> > Also for -r option, it doesn't need to call function FindEndOfXLOG() as
>
> the
>
> > that work will be achieved by above point.
> >
> > It will just rewrite the control file and don’t do other resets.
> >
> >
> > The algorithm of restoring the pg_control value from old xlog file:
> >    1. Retrieve all of the active xlog files from xlog direcotry into a
>
> list
>
> > by increasing order, according their timeline, log id, segment id.
> >    2. Search the list to find the oldest xlog file of the lastest time
>
> line.
>
> >    3. Search the records from the oldest xlog file of latest time line to
> > the latest xlog file of latest time line, if the checkpoint record
> >       has been found, update the latest checkpoint and previous
>
> checkpoint.
>
> > Apart from above some changes in code will be required after the Xlog
>
> patch
>
> > by Heikki.
> >
> > Suggest me if my understanding is correct?
>
> I guess my first question is: why do we need this?  There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important.  Do you have reason to believe
> that this one is?  It's been six years since that email, so it's worth
> asking if this is actually relevant.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Allow WAL information to recover corrupted pg_controldata

From

Tom Lane

Date:

16 June 2012, 03:41:22

Amit kapila <amit.kapila@huawei.com> writes:
>> AFAIR pg_controldata fit on a disk sector so it can not be half written.

> It can be corrupt due to some other reasons as well like torn disk sector.

"Torn disk sector"? Please, this is nonsense. Disks cannot write half
a sector and then stop. A sufficiently badly designed drive might
attempt to start a write when it didn't have enough power left to finish
... but the result of that would be a corrupt sector with a non-matching
CRC, not one that read back okay but contained erroneous data.

> The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be
relevantto do this patch.

Well, we invented pg_resetxlog with the thought that it might be useful
for such situations, but I'm not sure offhand that we've ever seen a
field report of corrupted pg_control files. For instance, a quick
search in the archives for "incorrect checksum in control file" turns up
only cases of pilot error, such as supposing that a 32-bit database
could be used with a 64-bit server or vice versa. Actual hardware
failures on the pg_control file could be expected to result in something
like "could not read from control file: I/O error", which I find no
evidence for at all in the archives.

Before adding new code to improve the situation, it would be good to
have (a) evidence that there's a problem worth solving, and (b) a theory
as to what likely-to-occur cases the new code is going to make better,
while not making things worse in other likely-to-occur cases. Case in
point here is that it's not immediately obvious that we should trust
the contents of WAL more than pg_control --- the former gets a whole
lot more write traffic and hence has many more opportunities for
failure.

At the moment I don't see that we have either (a) or (b), so I think
it's pretty dubious to be making any changes of this sort.
regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit kapila

Date:

17 June 2012, 02:05:40

> "Torn disk sector"?  Please, this is nonsense.  Disks cannot write half a sector and then stop.    What I was
intendedto say is corruption due to hardware or some other problem, not because when    Postgres is updating pg_control
file.For example http://cquirke.mvps.org/9x/baddata.htm.  

> Well, we invented pg_resetxlog with the thought that it might be useful
>  for such situations, but I'm not sure offhand that we've ever seen a
>  field report of corrupted pg_control files.
I have found few cases where people have tried to use pg_resetxlog due to hardware problems or missing pg_control file.
http://archives.postgresql.org/pgsql-performance/2004-06/msg00236.php
http://archives.postgresql.org/pgsql-general/2004-06/msg00173.php
http://archives.postgresql.org/pgsql-admin/2006-12/msg00205.php

> Case in point here is that it's not immediately obvious that we should trust
> the contents of WAL more than pg_control.   Agreed.

> At the moment I don't see that we have either (a) or (b), so I think
> it's pretty dubious to be making any changes of this sort.  As the chances of usecase for this feature are very less,
SoI will stop working on this feature. 

________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Saturday, June 16, 2012 12:11 PM
To: Amit kapila
Cc: Cédric Villemain; pgsql-hackers@postgresql.org; 'Robert Haas'
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Amit kapila <amit.kapila@huawei.com> writes:
>> AFAIR pg_controldata fit on a disk sector so it can not be half written.

>    It can be corrupt due to some other reasons as well like torn disk sector.

"Torn disk sector"?  Please, this is nonsense.  Disks cannot write half
a sector and then stop.  A sufficiently badly designed drive might
attempt to start a write when it didn't have enough power left to finish
... but the result of that would be a corrupt sector with a non-matching
CRC, not one that read back okay but contained erroneous data.

>  The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be
relevantto do this patch. 

Well, we invented pg_resetxlog with the thought that it might be useful
for such situations, but I'm not sure offhand that we've ever seen a
field report of corrupted pg_control files.  For instance, a quick
search in the archives for "incorrect checksum in control file" turns up
only cases of pilot error, such as supposing that a 32-bit database
could be used with a 64-bit server or vice versa.  Actual hardware
failures on the pg_control file could be expected to result in something
like "could not read from control file: I/O error", which I find no
evidence for at all in the archives.

Before adding new code to improve the situation, it would be good to
have (a) evidence that there's a problem worth solving, and (b) a theory
as to what likely-to-occur cases the new code is going to make better,
while not making things worse in other likely-to-occur cases.  Case in
point here is that it's not immediately obvious that we should trust
the contents of WAL more than pg_control --- the former gets a whole
lot more write traffic and hence has many more opportunities for
failure.

At the moment I don't see that we have either (a) or (b), so I think
it's pretty dubious to be making any changes of this sort.
                       regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Alvaro Herrera

Date:

18 June 2012, 13:35:16

Excerpts from Tom Lane's message of sáb jun 16 02:41:00 -0400 2012:
> Amit kapila <amit.kapila@huawei.com> writes:

> >  The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be
relevantto do this patch. 
>
> Well, we invented pg_resetxlog with the thought that it might be useful
> for such situations, but I'm not sure offhand that we've ever seen a
> field report of corrupted pg_control files.  For instance, a quick
> search in the archives for "incorrect checksum in control file" turns up
> only cases of pilot error, such as supposing that a 32-bit database
> could be used with a 64-bit server or vice versa.  Actual hardware
> failures on the pg_control file could be expected to result in something
> like "could not read from control file: I/O error", which I find no
> evidence for at all in the archives.

Hm, what about the situation where pg_control is lost completely to a
filesystem failure?  I remember doing disaster recovery on this problem
once ... As far as I recall the pg_xlog files were in a separate
partition so they weren't lost.  Some other files in the main data
partition were lost as well.  (I don't remember what is it that we had
to do to create a fake pg_control).

Maybe, even if Amit's code does not end up in pg_resetxlog, it could be
useful as a DR tool, assuming the code does not cause endless
maintenance burden.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Allow WAL information to recover corrupted pg_controldata

From

Tom Lane

Date:

18 June 2012, 13:43:50

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of sáb jun 16 02:41:00 -0400 2012:
>> Well, we invented pg_resetxlog with the thought that it might be useful
>> for such situations, but I'm not sure offhand that we've ever seen a
>> field report of corrupted pg_control files.

> Hm, what about the situation where pg_control is lost completely to a
> filesystem failure?  I remember doing disaster recovery on this problem
> once ... As far as I recall the pg_xlog files were in a separate
> partition so they weren't lost.  Some other files in the main data
> partition were lost as well.

Hm ... well, as long as we have a clear idea of a use-case, I'm not
opposed in principle to working on this area.

> (I don't remember what is it that we had
> to do to create a fake pg_control).

AFAIR you can create pg_control from scratch already with pg_resetxlog.
The hard part is coming up with values for the counters, such as the
next WAL location.  Some of them such as next OID are pretty harmless
if you don't guess right, but I'm worried that wrong next WAL could
make things worse not better.
        regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

18 June 2012, 14:53:14

> AFAIR you can create pg_control from scratch already with pg_resetxlog.
> The hard part is coming up with values for the counters, such as the
> next WAL location.  Some of them such as next OID are pretty harmless
> if you don't guess right, but I'm worried that wrong next WAL could
> make things worse not better.

I believe if WAL files are proper as mentioned in Alvaro's mail, the purposed logic should generate
correct values.
Do you see any problem in logic purposed in my original mail.
Can I resume my work on this feature?


With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Tom Lane

Date:

19 June 2012, 02:43:42

Amit Kapila <amit.kapila@huawei.com> writes:
>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>> The hard part is coming up with values for the counters, such as the
>> next WAL location.  Some of them such as next OID are pretty harmless
>> if you don't guess right, but I'm worried that wrong next WAL could
>> make things worse not better.

> I believe if WAL files are proper as mentioned in Alvaro's mail, the
> purposed logic should generate correct values.

I've got a problem with the assumption that, when pg_control is trash,
megabytes or gigabytes of WAL can still be relied on completely.

I'm almost inclined to suggest that we not get next-LSN from WAL, but
by scanning all the pages in the main data store and computing the max
observed LSN.  This is clearly not very attractive from a performance
standpoint, but it would avoid the obvious failure mode where you lost
some recent WAL segments along with pg_control.
        regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

19 June 2012, 05:49:41

> I'm almost inclined to suggest that we not get next-LSN from WAL, but
> by scanning all the pages in the main data store and computing the max
> observed LSN.  This is clearly not very attractive from a performance
> standpoint, but it would avoid the obvious failure mode where you lost
> some recent WAL segments along with pg_control.

If we follow this approach, what should be handling in case next-LSN is greater than
last checkpoint record location read from WAL files. Currently I can see StratUpXLOG throws
PANIC error in such situation.
I think this can happen in case of missing some recent WAL segments.

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Fujii Masao

Date:

19 June 2012, 11:14:37

On Tue, Jun 19, 2012 at 2:44 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>> The hard part is coming up with values for the counters, such as the
>> next WAL location.  Some of them such as next OID are pretty harmless
>> if you don't guess right, but I'm worried that wrong next WAL could
>> make things worse not better.
>
> I believe if WAL files are proper as mentioned in Alvaro's mail, the purposed logic should generate
> correct values.
> Do you see any problem in logic purposed in my original mail.
> Can I resume my work on this feature?

Maybe I'm missing your point, but... why don't you just use PITR to
recover from the corruption of pg_control?

Regards,

--
Fujii Masao

Re: Allow WAL information to recover corrupted pg_controldata

From

Robert Haas

Date:

19 June 2012, 16:29:19

On Tue, Jun 19, 2012 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila@huawei.com> writes:
>>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>>> The hard part is coming up with values for the counters, such as the
>>> next WAL location.  Some of them such as next OID are pretty harmless
>>> if you don't guess right, but I'm worried that wrong next WAL could
>>> make things worse not better.
>
>> I believe if WAL files are proper as mentioned in Alvaro's mail, the
>> purposed logic should generate correct values.
>
> I've got a problem with the assumption that, when pg_control is trash,
> megabytes or gigabytes of WAL can still be relied on completely.
>
> I'm almost inclined to suggest that we not get next-LSN from WAL, but
> by scanning all the pages in the main data store and computing the max
> observed LSN.  This is clearly not very attractive from a performance
> standpoint, but it would avoid the obvious failure mode where you lost
> some recent WAL segments along with pg_control.

I think it could be useful to have a tool that scans all the blocks
and computes that value, but I'd want it to just print the value out
and let me decide what to do about it.  There are cases where you
don't necessarily want to clobber pg_control, but you do have future
LSNs in your data file pages.  This can be either because the disk ate
your WAL, or because you didn't create recovery.conf, or because your
disk corrupted the LSNs on the data file pages.  I'd want a tool that
could be either run on an individual file, or recursively on a
directory.

In terms of the TODO item, I haven't yet heard anyone clearly state "I
wanted to use pg_controldata but it couldn't because X so therefore we
need this patch".  Alvaro mentioned the case where pg_control is
missing altogether, but:

[rhaas pgsql]$ rm ~/pgdata/global/pg_control
[rhaas pgsql]$ postgres
postgres: could not find the database system
Expected to find it in the directory "/Users/rhaas/pgdata",
but could not open file "/Users/rhaas/pgdata/global/pg_control": No
such file or directory
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: could not open file "global/pg_control" for reading: No
such file or directory
If you are sure the data directory path is correct, execute touch global/pg_control
and try again.
[rhaas pgsql]$ touch ~/pgdata/global/pg_control
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Guessed pg_control values:

First log file ID after reset:        0
First log file segment after reset:   69
pg_control version number:            922
Catalog version number:               201206141
Database system identifier:           5755831325641078488
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          0/3
Latest checkpoint's NextOID:          10000
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        3
Latest checkpoint's oldestXID's DB:   0
Latest checkpoint's oldestActiveXID:  0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

If these values seem acceptable, use -f to force reset.
[rhaas pgsql]$ pg_resetxlog -f ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Transaction log reset
[rhaas pgsql]$ postgres
LOG:  database system was shut down at 2012-06-19 15:25:28 EDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

So I still don't understand what problem we're solving here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

20 June 2012, 00:43:18

>> I believe if WAL files are proper as mentioned in Alvaro's mail, the
purposed logic should generate
>> correct values.
>> Do you see any problem in logic purposed in my original mail.
>> Can I resume my work on this feature?

> Maybe I'm missing your point, but... why don't you just use PITR to
> recover from the corruption of pg_control?

AFAIK PITR can be used in a scenario where there is a base back-up and we
have archived
the WAL files after that, now it can use WAL files to apply on base-backup.

In this scenario we don't know a point from where to start the next replay.
So I believe it will be difficult to use PITR in this scenario.


-----Original Message-----
From: Fujii Masao [mailto:masao.fujii@gmail.com]
Sent: Tuesday, June 19, 2012 7:44 PM
To: Amit Kapila
Cc: Tom Lane; Alvaro Herrera; Cédric Villemain; Pg Hackers; Robert Haas
Subject: Re: [HACKERS] Allow WAL information to recover corrupted
pg_controldata

On Tue, Jun 19, 2012 at 2:44 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>> The hard part is coming up with values for the counters, such as the
>> next WAL location.  Some of them such as next OID are pretty harmless
>> if you don't guess right, but I'm worried that wrong next WAL could
>> make things worse not better.
>
> I believe if WAL files are proper as mentioned in Alvaro's mail, the
purposed logic should generate
> correct values.
> Do you see any problem in logic purposed in my original mail.
> Can I resume my work on this feature?

Maybe I'm missing your point, but... why don't you just use PITR to
recover from the corruption of pg_control?

Regards,

--
Fujii Masao

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

20 June 2012, 01:42:26

>>  I've got a problem with the assumption that, when pg_control is trash,
>>  megabytes or gigabytes of WAL can still be relied on completely.
>>
>>  I'm almost inclined to suggest that we not get next-LSN from WAL, but
>>  by scanning all the pages in the main data store and computing the max
>>  observed LSN.  This is clearly not very attractive from a performance
>>  standpoint, but it would avoid the obvious failure mode where you lost
>>  some recent WAL segments along with pg_control.

> I think it could be useful to have a tool that scans all the blocks
> and computes that value, but I'd want it to just print the value out
> and let me decide what to do about it.  There are cases where you
> don't necessarily want to clobber pg_control, but you do have future
> LSNs in your data file pages.  This can be either because the disk ate
> your WAL, or because you didn't create recovery.conf, or because your
> disk corrupted the LSNs on the data file pages.  I'd want a tool that
> could be either run on an individual file, or recursively on a
> directory.

The whole point is we need to find a valid next-LSN (Redo Replay location as
I understand).
If we let user decide about it, I think it can lead to inconsistent
database.

As per my understanding postgres database can come to consistent point only
if it has
both datafiles and WAL after crash. 
So I am not able to think if it lost WAL, how we can it make a consistent
database.

> If these values seem acceptable, use -f to force reset.
> [rhaas pgsql]$ pg_resetxlog -f ~/pgdata
> pg_resetxlog: pg_control exists but is broken or unknown version; ignoring
it
> Transaction log reset
> [rhaas pgsql]$ postgres
> LOG:  database system was shut down at 2012-06-19 15:25:28 EDT
> LOG:  database system is ready to accept connections
> LOG:  autovacuum launcher started

> So I still don't understand what problem we're solving here.

1. The values (like nextoid, nextxid, etc) are guessed values which can be
improved by having  these values from last checkpoint record using WAL files. 

2. The value for next-LSN (ControlFile.checkPointCopy.redo) will be guessed
value which if  directly used for recovery after pg_resetxlog will lead to inconsistent
database.  So I want to improve the logic to have either appropriate value for
next-LSN or more reliable value.

In documentation, it is mentioned that starting database after using
pg_resetxlog can contain inconsistent data.
The exact wording is mentioned below in mail.

My purposal to work on this Todo item is to improve the values generated for
pg_control, so that it becomes more easy for users to recover from database
corruption scenario's.
I don't think even after working on this feature, user can recover database
for all corruption scenario's. However it can improve the situation from
now. 

Pg_resetxlog documentation related excerpts- 
"After running this command, it should be possible to start the server, but
bear in mind that the database might contain inconsistent data due to
partially-committed transactions. You should immediately dump your data, run
initdb, and reload. After reload, check for inconsistencies and repair as
needed."
  

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

20 June 2012, 10:22:37

> I'm almost inclined to suggest that we not get next-LSN from WAL, but
> by scanning all the pages in the main data store and computing the max
> observed LSN.  This is clearly not very attractive from a performance
> standpoint, but it would avoid the obvious failure mode where you lost
> some recent WAL segments along with pg_control.

According to my analysis, this will have some problem.
I will explain the problem by taking example scenario.

Example Scenario -
Let us assume that database crashes and it can be recovered by doing crash recovery.
Now assume we have Data files and WAL files intact and only control file is lost.
Now user uses pg_resetxlog to generate pg_control file and we uses new algorithm to generate next-LSN.

Summary of events before database crash-
1. Checkpoint was in progress and it has already noted next-LSN location (LSN-107) and mark the dirty pages as
BM_CHECKPOINT_NEEDED.
2. At this point a new transaction dirties 2 pages, first it dirties a fresh page (for this change LSN-108)   and then
itdirties one which is already marked as BM_CHECKPOINT_NEEDED (for this change LSN-109). 
3. CheckPoint starts flushing pages.
4. It will now flush the page with LSN-109 but not the page 108.
4. Checkpoint finishes.
5. Database crashes.

Normal Crash Recovery -
it will start the replay from 107 and after recovery the database will be in consistent state.

Pg_resetxlog -
It will generate the next-LSN point as 109 which when used for recovery will generate inconsistent database.
However if we would have relied on WAL, it would have got next-LSN as 107.

This is just an Example case to show that there can be some problems using the algorithm for generating
next-LSN from pages. However it doesn't prove that generating from WAL will be correct.

Please correct my understanding if I am wrong.
With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Aidan Van Dyk

Date:

20 June 2012, 10:42:52

On Wed, Jun 20, 2012 at 9:21 AM, Amit Kapila <amit.kapila@huawei.com> wrote:

> Example Scenario -

> Now assume we have Data files and WAL files intact and only control file is lost.

Just so I understand correctly, the aim of this is to "fix" the
situation where out of the thousands of files and 100s of GB of data
in my pg directory, the *only* corruption is that a single file
pg_control file is missing?

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Allow WAL information to recover corrupted pg_controldata

From

Tom Lane

Date:

20 June 2012, 12:14:26

Amit Kapila <amit.kapila@huawei.com> writes:
>> I'm almost inclined to suggest that we not get next-LSN from WAL, but
>> by scanning all the pages in the main data store and computing the max
>> observed LSN.  This is clearly not very attractive from a performance
>> standpoint, but it would avoid the obvious failure mode where you lost
>> some recent WAL segments along with pg_control.

> According to my analysis, this will have some problem. 

I think you're missing the point.  There is no possible way to guarantee
database consistency after applying pg_resetxlog, unless the database
had been cleanly shut down beforehand.  The reset will lose the xlog
information that was needed to restore consistency.  So arguing from
examples that demonstrate this is rather pointless.  Rather, the value
of pg_resetxlog is to be able to start the database at all so that info
can be extracted from it.  What we are looking for is not perfection,
because that's impossible, but just to not make a bad situation worse.
The reason I'm concerned about selecting a next-LSN that's certainly
beyond every LSN in the database is that not doing so could result in
introducing further corruption, which would be entirely avoidable with
more care in choosing the next-LSN.

> Pg_resetxlog -
> It will generate the next-LSN point as 109 which when used for recovery will generate inconsistent database.
> However if we would have relied on WAL, it would have got next-LSN as 107.

Umm ... the entire point of pg_resetxlog is to throw away WAL.  Not to
rely on it.

It's conceivable that there would be some use in a tool that searches
the available WAL files for the latest checkpoint record and recreates a
pg_control file pointing at that checkpoint, without zapping the WAL
files.  This would be much different in purpose and usage from
pg_resetxlog, though.
        regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Fujii Masao

Date:

20 June 2012, 13:08:07

On Wed, Jun 20, 2012 at 12:40 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
>>> I believe if WAL files are proper as mentioned in Alvaro's mail, the
> purposed logic should generate
>>> correct values.
>>> Do you see any problem in logic purposed in my original mail.
>>> Can I resume my work on this feature?
>
>> Maybe I'm missing your point, but... why don't you just use PITR to
>> recover from the corruption of pg_control?
>
> AFAIK PITR can be used in a scenario where there is a base back-up and we
> have archived
> the WAL files after that, now it can use WAL files to apply on base-backup.

Yes. If you want to recover the database from the media crash like the
corruption of pg_control file, you basically should take a base backup
and set up continuous archiving.

> In this scenario we don't know a point from where to start the next replay.
> So I believe it will be difficult to use PITR in this scenario.

You can find out the point from the complete pg_control file which was
restored from the backup.

If pg_control is corrupted, we can easily imagine that other database files
would also be corrupted. I wonder how many cases where only pg_control
file gets corrupted are. In that case, pg_resetxlog is unhelpful at all.
You need to use PITR, intead.

Regards,

-- 
Fujii Masao

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

21 June 2012, 00:32:24

>> AFAIK PITR can be used in a scenario where there is a base back-up and we
>> have archived
>> the WAL files after that, now it can use WAL files to apply on
base-backup.

> Yes. If you want to recover the database from the media crash like the
> corruption of pg_control file, you basically should take a base backup
> and set up continuous archiving.

>> In this scenario we don't know a point from where to start the next
replay.
>> So I believe it will be difficult to use PITR in this scenario.

>You can find out the point from the complete pg_control file which was
>restored from the backup.

Yes, it can work the way you have explained or even by using Replication
solutions where
user can recreate the database from slave or other copy.
But the tool pg_resetxlog or similar tools are provided to handle situations
where user 
has not taken care enough to be saved from corruption.

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

21 June 2012, 00:32:53

> Just so I understand correctly, the aim of this is to "fix" the
> situation where out of the thousands of files and 100s of GB of data
> in my pg directory, the *only* corruption is that a single file
> pg_control file is missing?

This is just an example I have used to explain what should be the best way to generate
Next-LSN.
The overall aim for this feature is to start the database with as much accuracy as possible after database corruption
occurreddue to missing files or hardware crash.  
However it is not possible to start with full consistency and accuracy after such a
Situation.

-----Original Message-----
From: Aidan Van Dyk [mailto:aidan@highrise.ca]
Sent: Wednesday, June 20, 2012 7:13 PM
To: Amit Kapila
Cc: Pg Hackers
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

On Wed, Jun 20, 2012 at 9:21 AM, Amit Kapila <amit.kapila@huawei.com> wrote:

> Example Scenario -

> Now assume we have Data files and WAL files intact and only control file is lost.

Just so I understand correctly, the aim of this is to "fix" the
situation where out of the thousands of files and 100s of GB of data
in my pg directory, the *only* corruption is that a single file
pg_control file is missing?

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

21 June 2012, 02:25:06

> I think you're missing the point.  There is no possible way to guarantee database
> consistency after applying pg_resetxlog, unless the database had been cleanly shut
> down beforehand.  The reset will lose the xlog information that was needed to restore
> consistency.  So arguing from examples that demonstrate this is rather pointless.
> Rather, the value of pg_resetxlog is to be able to start the database at all so that
> info can be extracted from it.  What we are looking for is not perfection, because
> that's impossible, but just to not make a bad situation worse.

I got the point that we cannot reconstruct a consistent database where further operations can be allowed.

> The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not
doing 
> so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the
> next-LSN.

The further corruption can only be possible when we replay some wrong WAL by selecting wrong LSN.
Do you mean to say that if next-LSN is selected from pages, it will be a better position for starting Replay.
On the otherhand if we don't have to replay the WAL and just take the dump, how it will matter what is next-LSN.

>> Pg_resetxlog -
>> It will generate the next-LSN point as 109 which when used for recovery will generate inconsistent database.
>> However if we would have relied on WAL, it would have got next-LSN as 107.

>Umm ... the entire point of pg_resetxlog is to throw away WAL.  Not to
>rely on it.

Yes, that is right but the solution I have purposed in my first mail was not to reset it after getting consistent
checkpointand generating control file values from it. 
However now I understand that the problem and solution definition should consider that some WAL files are missing.

> It's conceivable that there would be some use in a tool that searches
> the available WAL files for the latest checkpoint record and recreates a
> pg_control file pointing at that checkpoint, without zapping the WAL
> files.  This would be much different in purpose and usage from
> pg_resetxlog, though.

I will collect all the requirements that can be done in this area from existing mails and
think more in it to generate concrete set of requirements that can be helpful to users to over come
such situations.

Requirements or suggestions from all are most welcome. It can help me to do some useful work for
PostgreSQL.

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Tom Lane

Date:

21 June 2012, 04:13:02

Amit Kapila <amit.kapila@huawei.com> writes:
>> The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not
doing

>> so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the

>> next-LSN.

> The further corruption can only be possible when we replay some wrong
> WAL by selecting wrong LSN.

No, this is mistaken.  Pages in the database that have LSN ahead of
where the server thinks the end of WAL is cause lots of problems
unrelated to replay; for example, inability to complete a checkpoint.
That might not directly lead to additional corruption, but consider
the case where such a page gets further modified, and the server decides
it doesn't need to create a full-page image because the LSN is ahead of
where the last checkpoint was.  A crash or two later, you have new
problems.

(Admittedly, once you've run pg_resetxlog you're best advised to just be
trying to dump what you've got, and not modify it more.  But sometimes
you have to hack the data just to get pg_dump to complete.)
        regards, tom lane

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

21 June 2012, 06:00:21

>>> The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not
doing 
>>> so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the

>>> next-LSN.

>> The further corruption can only be possible when we replay some wrong
>> WAL by selecting wrong LSN.

> No, this is mistaken.  Pages in the database that have LSN ahead of
> where the server thinks the end of WAL is cause lots of problems
> unrelated to replay; for example, inability to complete a checkpoint.
> That might not directly lead to additional corruption, but consider
> the case where such a page gets further modified, and the server decides
> it doesn't need to create a full-page image because the LSN is ahead of
> where the last checkpoint was.  A crash or two later, you have new
> problems.

Incase any modification happen to the database after it started, even if the next-LSN is max LSN of pages,
the modification can create a problem because the database will be in inconsistent state.

Please correct me if I am wrong in assuming that the next-LSN having value as max LSN of pages
1. has nothing to do with Replay. We should still reset the WAL so that no replay happens.
2. It is to avoid some further disasters.

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

22 June 2012, 06:31:43

Based on the discussion and suggestions in this mail chain, following features can be implemented:

1. To compute the value of max LSN in data pages based on user input whether he wants it for an individual file,  a
particulardirectory or whole database. 

2a. To search the available WAL files for the latest checkpoint record and prints the value.
2b. To search the available WAL files for the latest checkpoint record and recreates a pg_control file pointing at that
checkpoint.

I have kept both options to address different kind of corruption scenarios.

1. WAL files are in separate partition which is not corrupt, only the partition where data files and pg_control is
corrupt.In this case users can use options 2a or 2b to proceed. 
2. All pg_control, data, WAL are on same disk partition which got corrupt.   In this case he can use options 1 and 2a
todecide the next-LSN for pg_control and proceed. 

Suggestions?

If there is an agreement to do this features, I can send the proposal which kind of options we can keep in existing or
newutility for the usage. 

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Robert Haas

Date:

22 June 2012, 12:29:35

On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> Based on the discussion and suggestions in this mail chain, following features can be implemented:
>
> 1. To compute the value of max LSN in data pages based on user input whether he wants it for an individual file,
>   a particular directory or whole database.
>
> 2a. To search the available WAL files for the latest checkpoint record and prints the value.
> 2b. To search the available WAL files for the latest checkpoint record and recreates a pg_control file pointing at
thatcheckpoint. 
>
> I have kept both options to address different kind of corruption scenarios.

I think I can see all of those things being potentially useful.  There
are a couple of pending patches that will revise the WAL format
slightly; not sure how much those are likely to interfere with any
development you might do on (2) in the meantime.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

23 June 2012, 03:59:59


-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, June 22, 2012 8:59 PM
To: Amit Kapila
Cc: Tom Lane; Alvaro Herrera; Cédric Villemain; Pg Hackers
Subject: Re: [HACKERS] Allow WAL information to recover corrupted
pg_controldata

On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> Based on the discussion and suggestions in this mail chain, following
features can be implemented:
>>
>> 1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file,
>>   a particular directory or whole database.
>>
>> 2a. To search the available WAL files for the latest checkpoint record
and prints the value.
>> 2b. To search the available WAL files for the latest checkpoint record
and recreates a pg_control file pointing at that checkpoint.
>>
>> I have kept both options to address different kind of corruption
scenarios.

> I think I can see all of those things being potentially useful.

I shall start working on design and usage(how to provide these options to
users) of the features and present it once I am done.

> There are a couple of pending patches that will revise the WAL format
> slightly; not sure how much those are likely to interfere with any
> development you might do on (2) in the meantime.
Thanks.
I shall look into the patches (WAL Format change by Heikki and any other for
Logical Replication) to see which changes can effect the
implementation/design.


With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

05 July 2012, 02:01:28

From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, June 22, 2012 8:59 PM
On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> Based on the discussion and suggestions in this mail chain, following
features can be implemented:
>>
>> 1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file,
>>   a particular directory or whole database.
>>
>> 2a. To search the available WAL files for the latest checkpoint record
and prints the value.
>> 2b. To search the available WAL files for the latest checkpoint record
and recreates a pg_control file pointing at that checkpoint.
>>
>> I have kept both options to address different kind of corruption
scenarios.

> I think I can see all of those things being potentially useful.  There
> are a couple of pending patches that will revise the WAL format
> slightly; not sure how much those are likely to interfere with any
> development you might do on (2) in the meantime.

Below is the details of Option-2, for Option-1, I will send mail separately

New option for pg_resetxlog:
-----------------------------
1. Introduce option -r to restore the control file if possible and print
those values.
3. User need to give option -f along with -r to rewrite the control file
from WAL files.
2. If not able to get the control information from WAL files then the   control data will be guessed and proceedes as
normalreset xlog.  
4. If the control information is restored, then the option -l is ignored.

Design for new Option:
----------------------

1.  Validate the pg_xlog directory before proceeding of restoring control
values. if the directory    is invalid then the control values will be guessed.
2.  Read the pg_xlog directory and read all the existing files.
3.  If it is a valid xlog file then add it to a list in an increasing order,
Otherwise the file    is ignored and continue to the next file.
4.  Try to find the last timestamp file from the list to start reading for a
checkpoint record.
5.  Read the first page from the file and validate it. if the validation
fails the restore happens with    guessed values.
6.  Read the first record as start of the record from the identified first
xlog file.
7.  If the first record is a continuation record from a previous record then
ignore the record    and continue to the next record.
8.  After getting the entire record then the record is validated, if it is
not a valid record    searching for the next record will be stopped and the control values
will be guessed.
9.  Search all the files to the end of the last file to get the latest
checkpoint record.
10. While searching for the record, if it is not reaching the last file
(there is missing file or invalid record)    then treat this scenario as a failure of finding the checkpoint record
and go for guessing the control values.
11. After finding the last checkpoint record, update the checkpoint record
information in the control file.

Implementation:
----------------
1. We need to use most of the functionality of functions mentioned below.
One way is to duplicate the code of these   functions related to functionality required by pg_resetxlog in
pg_resetxlog module. I have checked other modules also   but didn't find how we can use common functionality in server
utility
from backend code.   Could you please point me for the appropriate way for doing it.
  The list of functions:  1. ValidateXLOGDirectoryStructure   2. XLogPageRead   3. ReadRecord   4. RecordIsValid   5.
ValidXLOGPageHeader  6. ValidXLogRecordHeader 

Suggestions/Comments/Thoughts?

With Regards,
Amit Kapila.

Re: Allow WAL information to recover corrupted pg_controldata

From

Alvaro Herrera

Date:

05 November 2012, 14:03:01

I'm not sure what to do with this patch.  There was some resistance to
the idea originally; then after some discussion, there was some
apparent agreement that it might be useful on occasion.  Later, a patch
was posted, but there was almost no review of it; except to say that it
should probably be reworked on top of an hypothetical, future XLogReader
feature.

Since it doesn't look like we're going anywhere with it soon, I'm going
to close it as returned with feedback.  Hopefully, if we get XLogReader
in 9.3, we will have time to rebase this patch on top of that.  (I
invite Amit to give Heikki's version of XLogReader patch a look.)

(It is very hard to track down vague references to old threads that
aren't properly linked in new threads.  Please make sure to reply to old
emails, or at least to give Message-Ids or URLs when starting new
threads.  I am replying to one message of each old thread here.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Allow WAL information to recover corrupted pg_controldata

From

Amit Kapila

Date:

05 November 2012, 14:39:14

On Monday, November 05, 2012 7:33 PM  Alvaro Herrera wrote:
> I'm not sure what to do with this patch.  There was some resistance to
> the idea originally; then after some discussion, there was some
> apparent agreement that it might be useful on occasion.  Later, a patch
> was posted, but there was almost no review of it; except to say that it
> should probably be reworked on top of an hypothetical, future XLogReader
> feature.
> 
> Since it doesn't look like we're going anywhere with it soon, I'm going
> to close it as returned with feedback.  Hopefully, if we get XLogReader
> in 9.3, we will have time to rebase this patch on top of that.  (I
> invite Amit to give Heikki's version of XLogReader patch a look.)

The patch for which Heikki has given comment
(https://commitfest.postgresql.org/action/patch_view?id=897) is already
moved to next CF.
This was not related to XLogReader. However as there is not much interest in
this feature, so it is okay. 
> (It is very hard to track down vague references to old threads that
> aren't properly linked in new threads.  Please make sure to reply to old
> emails, or at least to give Message-Ids or URLs when starting new
> threads.  I am replying to one message of each old thread here.)

My mistake, I am sorry for that and I shall try to take care for future
work.

With Regards,
Amit Kapila.