Thread: Patch for option in pg_resetxlog for restore from WAL files
Patch implementing the design in below mail chain is attached with this mail.
Let me know if there is any problem or objections?
Shall I put in CF(2012-09)?
1. Start the server and do some operations.
2. Kill the server with abort option and delete the pg_control file of a databae while the server process is in progress and create an empty file with the same name.
3. check the restore of pg_control file with -f option.
Result
The control file is recreated and recovery happens with the remaining redo after server restart.
Test - 2
1. start the server and shut it down the server once the checkpoint happens after cleanup the old xlog files. where only one checkpoint record is present in the xlog files.
2. Kill the server with abort option and delete the pg_control file of a databae and create an empty file with the same name.
3. check the restore of pg_control file with -f option.
Result
The control file is recreated and recovery happens with the remaining redo after server restart.
Test-3
1. Start the server and execute the operations which resuls in splitting the record header between two redo files.
2. Test the restore of control file after the server is shutdown, the control file deleted and recreated the same.
3. Delete the first xlog file which contains checkpoint record.
4. check the restore of pg_control file with -f option.
Result
The control file is recreated with guessed values and recovery is not possible.
Test-4
1. start the server and shut it down normally.
2. Delete the pg_control file of a databae and create an empty file with the same name.
3. Delete all the valid xlog files and create some invalid xlog files.
4. check the restore of pg_control file with -f option.
Result
The control file is recreated with guessed values and recovery is not possible.
Test-5
1. Start the server on the database.
2. Delete the pg_control file of a database and create an empty file with the same name.
3. Try to restore the control file where the server is already running on the same database.
Result
The restore of control file fails as server is already running and the pid file already exist.
>From: Amit Kapila [mailto:amit.kapila@huawei.com]
>Sent: Thursday, July 05, 2012 10:21 AM
>>From: Robert Haas [mailto:robertmhaas@gmail.com]
>>Sent: Friday, June 22, 2012 8:59 PM
>>On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> Based on the discussion and suggestions in this mail chain, following
features can be implemented:
>>
>> 1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file,
>> a particular directory or whole database.
>>
>> 2a. To search the available WAL files for the latest checkpoint record
and prints the value.
>> 2b. To search the available WAL files for the latest checkpoint record
and recreates a pg_control file pointing at that checkpoint.
>>
>> I have kept both options to address different kind of corruption
scenarios.
> I think I can see all of those things being potentially useful. There
> are a couple of pending patches that will revise the WAL format
> slightly; not sure how much those are likely to interfere with any
> development you might do on (2) in the meantime.
> Below is the details of Option-2, for Option-1, I will send mail separately
> New option for pg_resetxlog:
> -----------------------------
> 1. Introduce option -r to restore the control file if possible and print
> those values.
> 3. User need to give option -f along with -r to rewrite the control file
> from WAL files.
> 2. If not able to get the control information from WAL files then the
> control data will be guessed and proceedes as normal reset xlog.
> 4. If the control information is restored, then the option -l is ignored.
> Design for new Option:
> ----------------------
> 1. Validate the pg_xlog directory before proceeding of restoring control
> values. if the directory is invalid then the control values will be guessed.
> 2. Read the pg_xlog directory and read all the existing files.
> 3. If it is a valid xlog file then add it to a list in an increasing order,
> Otherwise the file is ignored and continue to the next file.
> 4. Try to find the last timestamp file from the list to start reading for a
> checkpoint record.
> 5. Read the first page from the file and validate it. if the validation
> fails the restore happens with guessed values.
> 6. Read the first record as start of the record from the identified first xlog file.
> 7. If the first record is a continuation record from a previous record then
> ignore the record and continue to the next record.
> 8. After getting the entire record then the record is validated, if it is
>not a valid record searching for the next record will be stopped and the control values
>will be guessed.
>9. Search all the files to the end of the last file to get the latest
>checkpoint record.
>10. While searching for the record, if it is not reaching the last file
> (there is missing file or invalid record) then treat this scenario as a failure of finding the checkpoint record
> and go for guessing the control values.
> 11. After finding the last checkpoint record, update the checkpoint record
> information in the control file.
> Implementation:
> ----------------
> 1. We need to use most of the functionality of functions mentioned below.
> One way is to duplicate the code of these
> functions related to functionality required by pg_resetxlog in
> pg_resetxlog module. I have checked other modules also
> but didn't find how we can use common functionality in server utility
> from backend code.
> Could you please point me for the appropriate way for doing it.
> The list of functions:
> 1. ValidateXLOGDirectoryStructure
> 2. XLogPageRead
> 3. ReadRecord
> 4. RecordIsValid
>5. ValidXLOGPageHeader
> 6. ValidXLogRecordHeader
>Suggestions/Comments/Thoughts?
With Regards,
Amit Kapila.
Attachment
I have uploaded the patch for new option in pg_resetxlog at below location:
https://commitfest.postgresql.org/action/patch_view?id=897
This completes the implementation of Option-2 discussed in below mail.
Now I will work on Option-1 (1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file, a particular directory or whole database.)
> Sent: Wednesday, July 18, 2012 7:17 PM
> Patch implementing the design in below mail chain is attached with this mail.
>From: Amit Kapila [mailto:amit.kapila@huawei.com]
>Sent: Thursday, July 05, 2012 10:21 AM
>>From: Robert Haas [mailto:robertmhaas@gmail.com]
>>Sent: Friday, June 22, 2012 8:59 PM
>>On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> Based on the discussion and suggestions in this mail chain, following
features can be implemented:
>>
>> 1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file,
>> a particular directory or whole database.
>>
>> 2a. To search the available WAL files for the latest checkpoint record
and prints the value.
>> 2b. To search the available WAL files for the latest checkpoint record
and recreates a pg_control file pointing at that checkpoint.
>>
>> I have kept both options to address different kind of corruption
scenarios.
> I think I can see all of those things being potentially useful. There
> are a couple of pending patches that will revise the WAL format
> slightly; not sure how much those are likely to interfere with any
> development you might do on (2) in the meantime.
With Regards,
Amit Kapila.
On 18.07.2012 16:47, Amit kapila wrote: > Patch implementing the design in below mail chain is attached with this mail. This patch copies the ReadRecord() function and a bunch of related functions from xlog.c into pg_resetxlog.c. There's a separate patch in the current commitfest to make that code reusable, without having to copy-paste it to every tool that wants to parse the WAL. See https://commitfest.postgresql.org/action/patch_view?id=860. This patch should be refactored to make use of that framework, as soon as it's committed. - Heikki
> On Monday, September 24, 2012 2:30 PM Heikki Linnakangas wrote: > On 18.07.2012 16:47, Amit kapila wrote: > > Patch implementing the design in below mail chain is attached with > this mail. > > This patch copies the ReadRecord() function and a bunch of related > functions from xlog.c into pg_resetxlog.c. There's a separate patch in > the current commitfest to make that code reusable, without having to > copy-paste it to every tool that wants to parse the WAL. See > https://commitfest.postgresql.org/action/patch_view?id=860. This patch > should be refactored to make use of that framework, as soon as it's > committed. Sure. Thanks for the feedback. With Regards, Amit Kapila.
On 24 September 2012 04:00, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 18.07.2012 16:47, Amit kapila wrote: >> >> Patch implementing the design in below mail chain is attached with this >> mail. > > > This patch copies the ReadRecord() function and a bunch of related functions > from xlog.c into pg_resetxlog.c. There's a separate patch in the current > commitfest to make that code reusable, without having to copy-paste it to > every tool that wants to parse the WAL. See > https://commitfest.postgresql.org/action/patch_view?id=860. This patch > should be refactored to make use of that framework, as soon as it's > committed. Agreed, moving to next commitfest. Amit, suggest review of the patch that this now depends upon. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tuesday, September 25, 2012 6:27 PM Simon Riggs wrote : > On 24 September 2012 04:00, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: > > On 18.07.2012 16:47, Amit kapila wrote: > >> > >> Patch implementing the design in below mail chain is attached with > this > >> mail. > > > > > > This patch copies the ReadRecord() function and a bunch of related > functions > > from xlog.c into pg_resetxlog.c. There's a separate patch in the > current > > commitfest to make that code reusable, without having to copy-paste > it to > > every tool that wants to parse the WAL. See > > https://commitfest.postgresql.org/action/patch_view?id=860. This > patch > > should be refactored to make use of that framework, as soon as it's > > committed. > > Agreed, moving to next commitfest. > > Amit, suggest review of the patch that this now depends upon. Earlier I thought, I will try to finish in this CommitFest if the XLogReader Patch gets committed by next week. However if you feel it is better to work it for next CommitFest, I shall do it that way. With Regards, Amit Kapila.