Re: pg_rewind, a tool for resynchronizing an old master after failover - Mailing list pgsql-hackers

From Thom Brown
Subject Re: pg_rewind, a tool for resynchronizing an old master after failover
Date
Msg-id CAA-aLv7DtytDFnBoLW=sKzZLOLkQ_r+5qhtJDZ6BjKGUxY3y3Q@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind, a tool for resynchronizing an old master after failover  (Thom Brown <thom@linux.com>)
Responses Re: pg_rewind, a tool for resynchronizing an old master after failover
List pgsql-hackers
On 23 May 2013 10:03, Thom Brown <thom@linux.com> wrote:
> On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> Hi,
>>
>> I've been hacking on a tool to allow resynchronizing an old master server
>> after failover. The need to do a full backup/restore has been a common
>> complaint ever since we've had streaming replication. I saw on the wiki that
>> this was discussed in the dev meeting; too bad I couldn't make it.
>>
>> In a nutshell, the idea is to do copy everything that has changed between
>> the cluster, like rsync does, but instead of reading through all files, use
>> the WAL to determine what has changed. Here's a somewhat more detailed
>> explanation, from the README:
>>
>> Theory of operation
>> -------------------
>>
>> The basic idea is to copy everything from the new cluster to old, except for
>> the blocks that we know to be the same.
>>
>> 1. Scan the WAL log of the old cluster, starting from the point where
>> the new cluster's timeline history forked off from the old cluster. For each
>> WAL record, make a note of the data blocks that are touched. This yields a
>> list of all the data blocks that were changed in the old cluster, after the
>> new cluster forked off.
>>
>> 2. Copy all those changed blocks from the new master to the old master.
>>
>> 3. Copy all other files like clog, conf files etc. from the new cluster
>> to old. Everything except the relation files.
>>
>> 4. Apply the WAL from the new master, starting from the checkpoint
>> created at failover. (pg_rewind doesn't actually apply the WAL, it just
>> creates a backup label file indicating that when PostgreSQL is started, it
>> will start replay from that checkpoint and apply all the required WAL)
>>
>>
>> Please take a look: https://github.com/vmware/pg_rewind
>
> 6 instances set up:
>
> [Primary (5530)]
>     |
>     ---[Standby 1 (5531)]
>             |
>             ---[Standby 2 (5532)]
>             |
>             ---[Standby 3 (5533)]
>             |
>             ---[Standby 4 (5534)]
>             |
>             ---[Standby 5 (5535)]
>
> 1) Created a table on the primary with some data.
> 2) Promoted Standby 1
> 3) Cleanly shut down Primary
> 4) pg_rewind --target-pgdata=/tmp/primary
> --source-server='host=localhost port=5531 dbname=postgres'
>
> Last common WAL position: 0/30227F8 on timeline 1
> Last common checkpoint at 0/30227F8 on timeline 1
> error reading xlog record: record with zero length at 0/3022860
> Done!
>
> Contents of pg_xlog directory in Primary and Standby 1:
> thom@swift /tmp $ ls -l primary/pg_xlog/
> total 49156
> -rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
> -rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
> -rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
> -rw------- 1 thom users       41 May 23 09:52 00000002.history
> drwx------ 2 thom users       80 May 23 09:52 archive_status
> thom@swift /tmp $ ls -l standby1/pg_xlog/
> total 49156
> -rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
> -rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
> -rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
> -rw------- 1 thom users       41 May 23 09:50 00000002.history
> drwx------ 2 thom users       80 May 23 09:50 archive_status
>
> 5) Changed recovery.done in primary to point its primary_conninfo port
> to 5531 (that of Standby 1).
> 6) Renamed it to .conf.
> 7) Changed postgresql.conf to set the port back to its original one
> (as pg_rewind has caused it to match that of Standby 1)
> 8) Start Primary
>
> Latest log in primary reads:
>
> LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
> LOG:  entering standby mode
> LOG:  invalid xl_info in checkpoint record
> FATAL:  could not locate required checkpoint record
> HINT:  If you are not restoring from a backup, try removing the file
> "/tmp/primary/backup_label".
> LOG:  startup process (PID 31503) exited with exit code 1
> LOG:  aborting startup due to startup process failure
>
> 9) Okay, so I'll delete that label and try again.  Now all is well:
>
> LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
> LOG:  entering standby mode
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  redo starts at 0/3022828
> LOG:  record with zero length at 0/3041A60
> LOG:  consistent recovery state reached at 0/3041A60
> LOG:  database system is ready to accept read only connections
> LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
>
> 10) Connect to Standby 1 and insert more rows into our original table.
> 11) Connect to Primary and those rows are appearing.

By the way, without any data inserted I get:

thom@swift /tmp $ pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres' -v
connected to remote server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000002.history", length 41
Last common WAL position: 0/3000000 on timeline 1
pg_rewind: xlogreader.c:214: XLogReadRecord: Assertion `((RecPtr) %
8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) &
~((intptr_t) ((8) - 1))))' failed.
Aborted (core dumped)

And this magically goes away with -k on initdb.

--
Thom



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: getting rid of freezing
Next
From: Andres Freund
Date:
Subject: Re: pg_rewind, a tool for resynchronizing an old master after failover