Thread: pg_rewind, a tool for resynchronizing an old master after failover

pg_rewind, a tool for resynchronizing an old master after failover

From

Heikki Linnakangas

Date:

23 May 2013, 14:10:21

Hi,

I've been hacking on a tool to allow resynchronizing an old master 
server after failover. The need to do a full backup/restore has been a 
common complaint ever since we've had streaming replication. I saw on 
the wiki that this was discussed in the dev meeting; too bad I couldn't 
make it.

In a nutshell, the idea is to do copy everything that has changed 
between the cluster, like rsync does, but instead of reading through all 
files, use the WAL to determine what has changed. Here's a somewhat more 
detailed explanation, from the README:

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except 
for the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For 
each WAL record, make a note of the data blocks that are touched. This 
yields a list of all the data blocks that were changed in the old 
cluster, after the new cluster forked off.

2. Copy all those changed blocks from the new master to the old master.

3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.

4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just 
creates a backup label file indicating that when PostgreSQL is started, 
it will start replay from that checkpoint and apply all the required WAL)


Please take a look: https://github.com/vmware/pg_rewind

- Heikki

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Robert Haas

Date:

23 May 2013, 14:55:46

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> 1. Scan the WAL log of the old cluster, starting from the point where
> the new cluster's timeline history forked off from the old cluster. For each
> WAL record, make a note of the data blocks that are touched. This yields a
> list of all the data blocks that were changed in the old cluster, after the
> new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks.  After WAL forks, the transaction commits.  Then, it
hints some of the tuples that it wrote.  There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Simon Riggs

Date:

23 May 2013, 15:03:27

On 23 May 2013 12:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> Please take a look: https://github.com/vmware/pg_rewind

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

As a result, while it sounds interesting, people should be aware of
that and I suggest we shouldn't discuss that code on this list, to
avoid any disputes should we decide to include a similar facility in
core Postgres in the future.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Thom Brown

Date:

23 May 2013, 17:04:36

On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> Hi,
>
> I've been hacking on a tool to allow resynchronizing an old master server
> after failover. The need to do a full backup/restore has been a common
> complaint ever since we've had streaming replication. I saw on the wiki that
> this was discussed in the dev meeting; too bad I couldn't make it.
>
> In a nutshell, the idea is to do copy everything that has changed between
> the cluster, like rsync does, but instead of reading through all files, use
> the WAL to determine what has changed. Here's a somewhat more detailed
> explanation, from the README:
>
> Theory of operation
> -------------------
>
> The basic idea is to copy everything from the new cluster to old, except for
> the blocks that we know to be the same.
>
> 1. Scan the WAL log of the old cluster, starting from the point where
> the new cluster's timeline history forked off from the old cluster. For each
> WAL record, make a note of the data blocks that are touched. This yields a
> list of all the data blocks that were changed in the old cluster, after the
> new cluster forked off.
>
> 2. Copy all those changed blocks from the new master to the old master.
>
> 3. Copy all other files like clog, conf files etc. from the new cluster
> to old. Everything except the relation files.
>
> 4. Apply the WAL from the new master, starting from the checkpoint
> created at failover. (pg_rewind doesn't actually apply the WAL, it just
> creates a backup label file indicating that when PostgreSQL is started, it
> will start replay from that checkpoint and apply all the required WAL)
>
>
> Please take a look: https://github.com/vmware/pg_rewind

6 instances set up:

[Primary (5530)]   |   ---[Standby 1 (5531)]           |           ---[Standby 2 (5532)]           |
---[Standby3 (5533)]           |           ---[Standby 4 (5534)]           |           ---[Standby 5 (5535)]
 

1) Created a table on the primary with some data.
2) Promoted Standby 1
3) Cleanly shut down Primary
4) pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres'

Last common WAL position: 0/30227F8 on timeline 1
Last common checkpoint at 0/30227F8 on timeline 1
error reading xlog record: record with zero length at 0/3022860
Done!

Contents of pg_xlog directory in Primary and Standby 1:
thom@swift /tmp $ ls -l primary/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users       41 May 23 09:52 00000002.history
drwx------ 2 thom users       80 May 23 09:52 archive_status
thom@swift /tmp $ ls -l standby1/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users       41 May 23 09:50 00000002.history
drwx------ 2 thom users       80 May 23 09:50 archive_status

5) Changed recovery.done in primary to point its primary_conninfo port
to 5531 (that of Standby 1).
6) Renamed it to .conf.
7) Changed postgresql.conf to set the port back to its original one
(as pg_rewind has caused it to match that of Standby 1)
8) Start Primary

Latest log in primary reads:

LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG:  entering standby mode
LOG:  invalid xl_info in checkpoint record
FATAL:  could not locate required checkpoint record
HINT:  If you are not restoring from a backup, try removing the file
"/tmp/primary/backup_label".
LOG:  startup process (PID 31503) exited with exit code 1
LOG:  aborting startup due to startup process failure

9) Okay, so I'll delete that label and try again.  Now all is well:

LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG:  entering standby mode
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/3022828
LOG:  record with zero length at 0/3041A60
LOG:  consistent recovery state reached at 0/3041A60
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 0/3000000 on timeline 2

10) Connect to Standby 1 and insert more rows into our original table.
11) Connect to Primary and those rows are appearing.

--
Thom

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Fujii Masao

Date:

23 May 2013, 18:34:26

On Thu, May 23, 2013 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> 1. Scan the WAL log of the old cluster, starting from the point where
>> the new cluster's timeline history forked off from the old cluster. For each
>> WAL record, make a note of the data blocks that are touched. This yields a
>> list of all the data blocks that were changed in the old cluster, after the
>> new cluster forked off.
>
> Suppose that a transaction is open and has written tuples at the point
> where WAL forks.  After WAL forks, the transaction commits.  Then, it
> hints some of the tuples that it wrote.  There is no record in WAL
> that those blocks are changed, but failing to revert them leads to
> data corruption.

Yes in asynchronous replication case. But in synchronous replication case,
after WAL forks, hint bits would not be set if their corresponding commit record
is not replicated to the standby. The transaction commit keeps waiting
for the reply
from the standby before updating clog. So, this data corruption would not happen
in sync case.

Regards,

-- 
Fujii Masao

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Robert Haas

Date:

23 May 2013, 19:14:21

On Thu, May 23, 2013 at 11:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, May 23, 2013 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com> wrote:
>>> 1. Scan the WAL log of the old cluster, starting from the point where
>>> the new cluster's timeline history forked off from the old cluster. For each
>>> WAL record, make a note of the data blocks that are touched. This yields a
>>> list of all the data blocks that were changed in the old cluster, after the
>>> new cluster forked off.
>>
>> Suppose that a transaction is open and has written tuples at the point
>> where WAL forks.  After WAL forks, the transaction commits.  Then, it
>> hints some of the tuples that it wrote.  There is no record in WAL
>> that those blocks are changed, but failing to revert them leads to
>> data corruption.
>
> Yes in asynchronous replication case. But in synchronous replication case,
> after WAL forks, hint bits would not be set if their corresponding commit record
> is not replicated to the standby. The transaction commit keeps waiting
> for the reply
> from the standby before updating clog. So, this data corruption would not happen
> in sync case.

Not necessarily.  SyncRepWaitForLSN() can be interrupted via a cancel signal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Heikki Linnakangas

Date:

23 May 2013, 20:40:52

On 23.05.2013 07:55, Robert Haas wrote:
> On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> 1. Scan the WAL log of the old cluster, starting from the point where
>> the new cluster's timeline history forked off from the old cluster. For each
>> WAL record, make a note of the data blocks that are touched. This yields a
>> list of all the data blocks that were changed in the old cluster, after the
>> new cluster forked off.
>
> Suppose that a transaction is open and has written tuples at the point
> where WAL forks.  After WAL forks, the transaction commits.  Then, it
> hints some of the tuples that it wrote.  There is no record in WAL
> that those blocks are changed, but failing to revert them leads to
> data corruption.

Bummer, you're right. Hmm, if you have checksums enabled, however, we'll 
WAL log a full-page every time a page is dirtied for setting a hint bit, 
which fixes the problem. So, there's a caveat with pg_rewind; you must 
have checksums enabled.

- Heikki

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Heikki Linnakangas

Date:

23 May 2013, 20:48:42

On 23.05.2013 08:03, Simon Riggs wrote:
> On 23 May 2013 12:10, Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
>
>> Please take a look: https://github.com/vmware/pg_rewind
>
> The COPYRIGHT file shows that VMware is claiming copyright on unstated
> parts of the code for this. As such, its not a normal submission to
> the PostgreSQL project, which involves placing copyright with the
> PGDG.

We have a lot of code in PostgreSQL source tree with different copyright 
notices, and there's no problem with that as long as the coe is licensed 
under the PostgreSQL license. For patches that add or modify code in 
PostgreSQL, we generally have copyright notices with just PGDG, to avoid 
having a long list of copyright notices of a lot of companies and 
individuals on every file. I'm no lawyer, but I believe there's no 
difference from the legal point of view.

> As a result, while it sounds interesting, people should be aware of
> that and I suggest we shouldn't discuss that code on this list, to
> avoid any disputes should we decide to include a similar facility in
> core Postgres in the future.

That's just paranoia. There are a lot of tools out there on pgfoundry, 
with various copyright holders and even difference licenses, and it's 
fine to talk about all those on this list. Besides, the code is licensed 
under the PostgreSQL license, so if someone decides we should have this 
e.g in contrib, you can just grab the sources and commit. Thirdly, 
there's no reason to refrain from even discussing this, even if someone 
would include a similar facility in core Postgres - this is about 
copyrights, not patents (and yes, this contribution has been cleared by 
VMware legal department; VMware doesn't hold any patents on this)

- Heikki

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Pavan Deolasee

Date:

23 May 2013, 20:52:29

On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 23.05.2013 07:55, Robert Haas wrote:
On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Bummer, you're right. Hmm, if you have checksums enabled, however, we'll WAL log a full-page every time a page is dirtied for setting a hint bit, which fixes the problem. So, there's a caveat with pg_rewind; you must have checksums enabled.

I was quite impressed with the idea, but hint bits indeed are problem. I realised the same issue also applies to the other idea that Fujii-san and others have suggested about waiting for dirty buffers to be written until the WAL is received at the standby. But since that idea would anyways need to be implemented in the core, we could teach SetHintBits() to return false unless the corresponding commit WAL records are written to the standby first.

Thanks,

Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Thom Brown

Date:

24 May 2013, 17:58:00

On 23 May 2013 10:03, Thom Brown <thom@linux.com> wrote:
> On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> Hi,
>>
>> I've been hacking on a tool to allow resynchronizing an old master server
>> after failover. The need to do a full backup/restore has been a common
>> complaint ever since we've had streaming replication. I saw on the wiki that
>> this was discussed in the dev meeting; too bad I couldn't make it.
>>
>> In a nutshell, the idea is to do copy everything that has changed between
>> the cluster, like rsync does, but instead of reading through all files, use
>> the WAL to determine what has changed. Here's a somewhat more detailed
>> explanation, from the README:
>>
>> Theory of operation
>> -------------------
>>
>> The basic idea is to copy everything from the new cluster to old, except for
>> the blocks that we know to be the same.
>>
>> 1. Scan the WAL log of the old cluster, starting from the point where
>> the new cluster's timeline history forked off from the old cluster. For each
>> WAL record, make a note of the data blocks that are touched. This yields a
>> list of all the data blocks that were changed in the old cluster, after the
>> new cluster forked off.
>>
>> 2. Copy all those changed blocks from the new master to the old master.
>>
>> 3. Copy all other files like clog, conf files etc. from the new cluster
>> to old. Everything except the relation files.
>>
>> 4. Apply the WAL from the new master, starting from the checkpoint
>> created at failover. (pg_rewind doesn't actually apply the WAL, it just
>> creates a backup label file indicating that when PostgreSQL is started, it
>> will start replay from that checkpoint and apply all the required WAL)
>>
>>
>> Please take a look: https://github.com/vmware/pg_rewind
>
> 6 instances set up:
>
> [Primary (5530)]
>     |
>     ---[Standby 1 (5531)]
>             |
>             ---[Standby 2 (5532)]
>             |
>             ---[Standby 3 (5533)]
>             |
>             ---[Standby 4 (5534)]
>             |
>             ---[Standby 5 (5535)]
>
> 1) Created a table on the primary with some data.
> 2) Promoted Standby 1
> 3) Cleanly shut down Primary
> 4) pg_rewind --target-pgdata=/tmp/primary
> --source-server='host=localhost port=5531 dbname=postgres'
>
> Last common WAL position: 0/30227F8 on timeline 1
> Last common checkpoint at 0/30227F8 on timeline 1
> error reading xlog record: record with zero length at 0/3022860
> Done!
>
> Contents of pg_xlog directory in Primary and Standby 1:
> thom@swift /tmp $ ls -l primary/pg_xlog/
> total 49156
> -rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
> -rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
> -rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
> -rw------- 1 thom users       41 May 23 09:52 00000002.history
> drwx------ 2 thom users       80 May 23 09:52 archive_status
> thom@swift /tmp $ ls -l standby1/pg_xlog/
> total 49156
> -rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
> -rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
> -rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
> -rw------- 1 thom users       41 May 23 09:50 00000002.history
> drwx------ 2 thom users       80 May 23 09:50 archive_status
>
> 5) Changed recovery.done in primary to point its primary_conninfo port
> to 5531 (that of Standby 1).
> 6) Renamed it to .conf.
> 7) Changed postgresql.conf to set the port back to its original one
> (as pg_rewind has caused it to match that of Standby 1)
> 8) Start Primary
>
> Latest log in primary reads:
>
> LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
> LOG:  entering standby mode
> LOG:  invalid xl_info in checkpoint record
> FATAL:  could not locate required checkpoint record
> HINT:  If you are not restoring from a backup, try removing the file
> "/tmp/primary/backup_label".
> LOG:  startup process (PID 31503) exited with exit code 1
> LOG:  aborting startup due to startup process failure
>
> 9) Okay, so I'll delete that label and try again.  Now all is well:
>
> LOG:  database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
> LOG:  entering standby mode
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  redo starts at 0/3022828
> LOG:  record with zero length at 0/3041A60
> LOG:  consistent recovery state reached at 0/3041A60
> LOG:  database system is ready to accept read only connections
> LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
>
> 10) Connect to Standby 1 and insert more rows into our original table.
> 11) Connect to Primary and those rows are appearing.

By the way, without any data inserted I get:

thom@swift /tmp $ pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres' -v
connected to remote server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000002.history", length 41
Last common WAL position: 0/3000000 on timeline 1
pg_rewind: xlogreader.c:214: XLogReadRecord: Assertion `((RecPtr) %
8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) &
~((intptr_t) ((8) - 1))))' failed.
Aborted (core dumped)

And this magically goes away with -k on initdb.

--
Thom

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Andres Freund

Date:

24 May 2013, 18:06:14

On 2013-05-24 10:57:22 -0400, Thom Brown wrote:
> By the way, without any data inserted I get:
> 
> thom@swift /tmp $ pg_rewind --target-pgdata=/tmp/primary
> --source-server='host=localhost port=5531 dbname=postgres' -v
> connected to remote server
> fetched file "global/pg_control", length 8192
> fetched file "pg_xlog/00000002.history", length 41
> Last common WAL position: 0/3000000 on timeline 1
> pg_rewind: xlogreader.c:214: XLogReadRecord: Assertion `((RecPtr) %
> 8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) &
> ~((intptr_t) ((8) - 1))))' failed.
> Aborted (core dumped)
> 
> And this magically goes away with -k on initdb.

That looks like this is a bug independent of -k. pg_rewind tries to read
the beginning of a page but xlogreader doesn't allow that atm. The
likely reason this doesn't happen with -k is autovacuum probably hints
some pages which are logged when checksums are enabled.

Should we make that assert
Assert((RecPtr % XLOG_BLCKSZ == 0) ||XRecOffIsValid(RecPtr));
instead?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Jim Nasby

Date:

24 May 2013, 18:24:02

On 5/23/13 12:51 PM, Pavan Deolasee wrote:
>
>
>
> On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>>
wrote:
>
>     On 23.05.2013 07:55, Robert Haas wrote:
>
>         On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
>         <hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>>  wrote:
>
>             1. Scan the WAL log of the old cluster, starting from the point where
>             the new cluster's timeline history forked off from the old cluster. For each
>             WAL record, make a note of the data blocks that are touched. This yields a
>             list of all the data blocks that were changed in the old cluster, after the
>             new cluster forked off.
>
>
>         Suppose that a transaction is open and has written tuples at the point
>         where WAL forks.  After WAL forks, the transaction commits.  Then, it
>         hints some of the tuples that it wrote.  There is no record in WAL
>         that those blocks are changed, but failing to revert them leads to
>         data corruption.
>
>
>     Bummer, you're right. Hmm, if you have checksums enabled, however, we'll WAL log a full-page every time a page is
dirtiedfor setting a hint bit, which fixes the problem. So, there's a caveat with pg_rewind; you must have checksums
enabled.
>
>
> I was quite impressed with the idea, but hint bits indeed are problem. I realised the same issue also applies to the
otheridea that Fujii-san and others have suggested about waiting for dirty buffers to be written until the WAL is
receivedat the standby. But since that idea would anyways need to be implemented in the core, we could teach
SetHintBits()to return false unless the corresponding commit WAL records are written to the standby first.

Would it be useful to turn this problem around? Heikki's proposal is based on being able to track (without fail) all
blocksthat have been modified; could we instead track blocks that we know for certain have NOT been modified? The
differencethere is that we can be more conservative in stating "we know this block is the same"; worst case we just do
someextra copying.

<thinking out loud...>
One possibility would be to use file timestamps. For files that are past a certain age on both master and slave, if we
forcethe timestamp on the slave to match the timestamp from the master, rsync will be able to safely ignore that file.
Irealize that's not as good as block-level detection, but it's probably a tremendous improvement over what we have
today.The critical thing in this case would be to *guarantee* that the timestamps did not match on modified files.

Of course, screwing around with FS timestamps in this manner is pretty grotty, at least on a live system. Perhaps
there'ssome way to track that info separately and then use it to change file timestamps before running rsync. Or if we
areable to define a list of files that we think may have changed, we just feed that list to rsync and let it do the
heavylifting.

-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Amit kapila

Date:

25 May 2013, 10:06:10

On Thursday, May 23, 2013 4:40 PM Heikki Linnakangas wrote:
> Hi,

> I've been hacking on a tool to allow resynchronizing an old master
> server after failover. The need to do a full backup/restore has been a
> common complaint ever since we've had streaming replication. I saw on
> the wiki that this was discussed in the dev meeting; too bad I couldn't
> make it.

> In a nutshell, the idea is to do copy everything that has changed
> between the cluster, like rsync does, but instead of reading through all
> files, use the WAL to determine what has changed. Here's a somewhat more
> detailed explanation, from the README:

This is really a nice idea and an important requirement from many users.

Does this tool handle all kind of operations user would have performaed after
forking of new cluster or it would mandate that user should not have performed certain kind of
operations on old cluster after new cluster forked off?

> Theory of operation
> -------------------

> The basic idea is to copy everything from the new cluster to old, except
> for the blocks that we know to be the same.

> 1. Scan the WAL log of the old cluster, starting from the point where
> the new cluster's timeline history forked off from the old cluster. For
> each WAL record, make a note of the data blocks that are touched. This
> yields a list of all the data blocks that were changed in the old
> cluster, after the new cluster forked off.

a. How about if after forking off new cluster, a new relation gets created in old cluster,  then it might not find the
blocksof same in new cluster, if new cluster also got   the same name relation as old but with different data, it might
geterror when it start  to replay WAL of new master as mentioned in point-4. 

b. How about if after forking off new cluster, an update occurs such that it has to put new   row in new heap page, now
inWAL it will have mention of old and new row blocks (blk-1 and blk-2).  It might be the case new cluster will not have
blk-2,so only blk-1 will be copied from new cluster,  in such scenario, it will have 2 valid versions of same row.  

With Regards,
Amit Kapila.

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Ants Aasma

Date:

25 May 2013, 12:22:49

On Sat, May 25, 2013 at 10:05 AM, Amit kapila <amit.kapila@huawei.com> wrote:
>> In a nutshell, the idea is to do copy everything that has changed
>> between the cluster, like rsync does, but instead of reading through all
>> files, use the WAL to determine what has changed. Here's a somewhat more
>> detailed explanation, from the README:
>
> This is really a nice idea and an important requirement from many users.

+1

> Does this tool handle all kind of operations user would have performaed after
> forking of new cluster or it would mandate that user should not have performed certain kind of
> operations on old cluster after new cluster forked off?

Truncate and all kinds of DROP come to mind, also table rewrites from
ALTER. The tool should probably just flag those relation files to be
copied wholesale.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Bruce Momjian

Date:

28 May 2013, 21:32:20

On Thu, May 23, 2013 at 01:48:24PM -0400, Heikki Linnakangas wrote:
> On 23.05.2013 08:03, Simon Riggs wrote:
> >On 23 May 2013 12:10, Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
> >
> >>Please take a look: https://github.com/vmware/pg_rewind
> >
> >The COPYRIGHT file shows that VMware is claiming copyright on unstated
> >parts of the code for this. As such, its not a normal submission to
> >the PostgreSQL project, which involves placing copyright with the
> >PGDG.
> 
> We have a lot of code in PostgreSQL source tree with different
> copyright notices, and there's no problem with that as long as the
> coe is licensed under the PostgreSQL license. For patches that add

Really?  Where?  I think we have removed them all, as far as I know.
A quick grep shows:
$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'./src/backend/regex/regexport.c: * Portions Copyright (c) 1998,
1999Henry Spencer./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry
Spencer./src/include/regex/regexport.h:* Portions Copyright (c) 1998, 1999 Henry Spencer./src/include/getopt_long.h: *
PortionsCopyright (c) 1987, 1993, 1994./src/bin/pg_dump/pg_backup_directory.c: *      Portions Copyright (c) 2000,
PhilipWarner./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994./src/port/getopt_long.c: * Portions
Copyright(c) 2003
 

Can someone comment on the "Philip Warner" item?  Would someone contact
him to clarify we can remove the mention?  CC'ing him.

> or modify code in PostgreSQL, we generally have copyright notices
> with just PGDG, to avoid having a long list of copyright notices of
> a lot of companies and individuals on every file. I'm no lawyer, but
> I believe there's no difference from the legal point of view.

Probably, but some mentions can cause concern when our code is reviewed
by companies, so simplicity is good.

> >As a result, while it sounds interesting, people should be aware of
> >that and I suggest we shouldn't discuss that code on this list, to
> >avoid any disputes should we decide to include a similar facility in
> >core Postgres in the future.
> 
> That's just paranoia. There are a lot of tools out there on
> pgfoundry, with various copyright holders and even difference
> licenses, and it's fine to talk about all those on this list.
> Besides, the code is licensed under the PostgreSQL license, so if
> someone decides we should have this e.g in contrib, you can just
> grab the sources and commit. Thirdly, there's no reason to refrain
> from even discussing this, even if someone would include a similar
> facility in core Postgres - this is about copyrights, not patents
> (and yes, this contribution has been cleared by VMware legal
> department; VMware doesn't hold any patents on this)

I think Simon has a good point, as VMWare has asserted patents on some
changes to their version of Postgres in the past, so if the copyright
mentions VMWare, we can't assume it is patent-free.  Just the fact you
had to check with the VMware legal department verifies there is cause
for concern about things coming from VMWare.  In fact, I am curious what
level of contribution requires a legal check, but I am not sure you can
even share that information.

Anyway, I would love to think we don't need to worry about this, but I
think we do --- not in this case, but in general.  I acknowledge that
VMWare has been disciplined in share only patent-free information, at
the community's request.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Andres Freund

Date:

28 May 2013, 21:38:14

On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > We have a lot of code in PostgreSQL source tree with different
> > copyright notices, and there's no problem with that as long as the
> > coe is licensed under the PostgreSQL license. For patches that add
>
> Really?  Where?  I think we have removed them all, as far as I know.
> A quick grep shows:
>
>     $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
>     ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
>     ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
>     ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
>     ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
>     ./src/bin/pg_dump/pg_backup_directory.c: *      Portions Copyright (c) 2000, Philip Warner
>     ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
>     ./src/port/getopt_long.c: * Portions Copyright (c) 2003

Just remove the "Portions" part from your grep, and you will see quite
some more...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Bruce Momjian

Date:

28 May 2013, 21:51:07

On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:
> On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > > We have a lot of code in PostgreSQL source tree with different
> > > copyright notices, and there's no problem with that as long as the
> > > coe is licensed under the PostgreSQL license. For patches that add
> >
> > Really?  Where?  I think we have removed them all, as far as I know.
> > A quick grep shows:
> >
> >     $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
> >     ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
> >     ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
> >     ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
> >     ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
> >     ./src/bin/pg_dump/pg_backup_directory.c: *      Portions Copyright (c) 2000, Philip Warner
> >     ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
> >     ./src/port/getopt_long.c: * Portions Copyright (c) 2003
> 
> Just remove the "Portions" part from your grep, and you will see quite
> some more...

Oh, I see.  Have we historically been OK with these as long as it is
clear it is the PG copyright?  I know we had do some cleanups in the
past, but I don't remember the details, obviously.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Andres Freund

Date:

28 May 2013, 21:56:05

On 2013-05-28 14:50:57 -0400, Bruce Momjian wrote:
> On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:
> > On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > > > We have a lot of code in PostgreSQL source tree with different
> > > > copyright notices, and there's no problem with that as long as the
> > > > coe is licensed under the PostgreSQL license. For patches that add
> > >
> > > Really?  Where?  I think we have removed them all, as far as I know.
> > > A quick grep shows:
> > >
> > >     $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
> > >     ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
> > >     ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
> > >     ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
> > >     ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
> > >     ./src/bin/pg_dump/pg_backup_directory.c: *      Portions Copyright (c) 2000, Philip Warner
> > >     ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
> > >     ./src/port/getopt_long.c: * Portions Copyright (c) 2003
> > 
> > Just remove the "Portions" part from your grep, and you will see quite
> > some more...
> 
> Oh, I see.  Have we historically been OK with these as long as it is
> clear it is the PG copyright?  I know we had do some cleanups in the
> past, but I don't remember the details, obviously.

I don't see a problem with a different copyrights as long as the
licenses are compatible. I remember code getting (re-)moved because it
was GPL, which is a different thing to having a different copyright.

I don't have a all that wide look over the history though.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Alvaro Herrera

Date:

28 May 2013, 22:49:26

Bruce Momjian wrote:

> Oh, I see.  Have we historically been OK with these as long as it is
> clear it is the PG copyright?  I know we had do some cleanups in the
> past, but I don't remember the details, obviously.

We've had request from companies because they wanted to distribute
Postgres and lawyers weren't comfortable with copyright statements in
assorted files.  In those cases we've asked the people mentioned in such
copyright statements, got approval to remove the offending copyright
lines, and removed them.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Christophe Pettus

Date:

28 May 2013, 22:53:57

On May 28, 2013, at 12:49 PM, Alvaro Herrera wrote:

> We've had request from companies because they wanted to distribute
> Postgres and lawyers weren't comfortable with copyright statements in
> assorted files.  In those cases we've asked the people mentioned in such
> copyright statements, got approval to remove the offending copyright
> lines, and removed them.

I assume this topic has come up and been rejected for some reason, but just in case: The Django project requires an
explicitagreement for contributions that end up in the main source tree for it, part of which is the acceptance of the
Djangolicense and copyright notice.  (I don't have my copy right in front of me, but I don't think it's a full-on
assignmentof copyright.) 

--
-- Christophe Pettus  xof@thebuild.com

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Bruce Momjian

Date:

28 May 2013, 23:04:22

On Tue, May 28, 2013 at 03:49:14PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Oh, I see.  Have we historically been OK with these as long as it is
> > clear it is the PG copyright?  I know we had do some cleanups in the
> > past, but I don't remember the details, obviously.
> 
> We've had request from companies because they wanted to distribute
> Postgres and lawyers weren't comfortable with copyright statements in
> assorted files.  In those cases we've asked the people mentioned in such
> copyright statements, got approval to remove the offending copyright
> lines, and removed them.

OK, so it was different _licenses_ that was the problem.  OK.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Josh Berkus

Date:

04 June 2013, 23:55:40

On 05/28/2013 11:32 AM, Bruce Momjian wrote:
> I think Simon has a good point, as VMWare has asserted patents on some
> changes to their version of Postgres in the past, so if the copyright

... which I'll point out that they *didn't* contribute, and which may
yet get resolved in a way that benefits the PostgreSQL community.

> mentions VMWare, we can't assume it is patent-free.  Just the fact you
> had to check with the VMware legal department verifies there is cause
> for concern about things coming from VMWare.  

That seems rather like a catch-22 Bruce.  If they don't check with the
legal department, it's dangerous, but if they do check, it's dangerous?

Presumably if they checked with the legal department, it's cleared.  We
should be wary of stuff contributed by company employees who *didn't* check.

This particular tool seems highly unlikely to be legitimately
patentable, anyway.  There's too much prior art.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

"Joshua D. Drake"

Date:

05 June 2013, 00:15:21

On 06/04/2013 01:55 PM, Josh Berkus wrote:

> That seems rather like a catch-22 Bruce.  If they don't check with the
> legal department, it's dangerous, but if they do check, it's dangerous?
>
> Presumably if they checked with the legal department, it's cleared.  We
> should be wary of stuff contributed by company employees who *didn't* check.
>
> This particular tool seems highly unlikely to be legitimately
> patentable, anyway.  There's too much prior art.

"legitimately patentable" is a rather ethereal phrase in our industry 
right now don't you think?

JD

>


-- 
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms   a rose in the deeps of my heart. - W.B. Yeats

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Bruce Momjian

Date:

05 June 2013, 05:54:57

On Tue, Jun  4, 2013 at 01:55:27PM -0700, Josh Berkus wrote:
> On 05/28/2013 11:32 AM, Bruce Momjian wrote:
> > I think Simon has a good point, as VMWare has asserted patents on some
> > changes to their version of Postgres in the past, so if the copyright
> 
> ... which I'll point out that they *didn't* contribute, and which may
> yet get resolved in a way that benefits the PostgreSQL community.

Right.  Simon was just verifying that we were good with this new
feature, as it had a VMWare copyright and was on github --- totally
legitimate question.

> > mentions VMWare, we can't assume it is patent-free.  Just the fact you
> > had to check with the VMware legal department verifies there is cause
> > for concern about things coming from VMWare.  
> 
> That seems rather like a catch-22 Bruce.  If they don't check with the
> legal department, it's dangerous, but if they do check, it's dangerous?
> 
> Presumably if they checked with the legal department, it's cleared.  We
> should be wary of stuff contributed by company employees who *didn't* check.

My point is that there was no mention of a legal check in the original
posting, which prompted Simon to ask:
http://www.postgresql.org/message-id/519DF910.4020609@vmware.com

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Greg Stark

Date:

05 June 2013, 12:13:09

On Thu, May 23, 2013 at 1:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The COPYRIGHT file shows that VMware is claiming copyright on unstated
> parts of the code for this. As such, its not a normal submission to
> the PostgreSQL project, which involves placing copyright with the
> PGDG.

Fwiw I was under the same misconception when I started at Google. But
this is wrong.

We have no copyright assignments to any entity named PGDG. All the
code is copyright the original authors. The PGDG is just a collective
noun for all the the people and organizations who have contributed to
Postgres. As long as all those people or organizations release the
code under the Postgres license then Postgres is ok with it. They
retain ownership of the copyright for the code they wrote but we don't
generally note it at that level of detail and just say everything is
owned by the PGDG.

I'm not a lawyer and I make no judgement on how solid a practice this
is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

-- 
greg

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Bruce Momjian

Date:

05 June 2013, 18:32:35

On Wed, Jun  5, 2013 at 10:12:17AM +0100, Greg Stark wrote:
> On Thu, May 23, 2013 at 1:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > The COPYRIGHT file shows that VMware is claiming copyright on unstated
> > parts of the code for this. As such, its not a normal submission to
> > the PostgreSQL project, which involves placing copyright with the
> > PGDG.
> 
> 
> Fwiw I was under the same misconception when I started at Google. But
> this is wrong.
> 
> We have no copyright assignments to any entity named PGDG. All the
> code is copyright the original authors. The PGDG is just a collective
> noun for all the the people and organizations who have contributed to
> Postgres. As long as all those people or organizations release the
> code under the Postgres license then Postgres is ok with it. They
> retain ownership of the copyright for the code they wrote but we don't
> generally note it at that level of detail and just say everything is
> owned by the PGDG.
> 
> I'm not a lawyer and I make no judgement on how solid a practice this
> is but that's VMware doesn't seem to be doing anything special here.
> They can retain copyright ownership of their contributions as long as
> they're happy releasing it under the Postgres copyright. Ideally they
> wold also be happy with a copyright notice that includes all of the
> PGDG just to reduce the maintenance headache.

Yes, completely true, and I was not clear on that myself either. 
Several people pointed out similar user copyrights in our existing code,
which I then realized were not a problem.  As long as the copyright
details are the same as our code, anyone can hold the copyright, I
think.

Part of my concern was patents.  Because VMWare asserts patents on
Postgres enhancements, when I saw VMWare copyright code, my "concern"
antenna went up and was glad to find it had all be handled by Heikki
already.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Tom Lane

Date:

05 June 2013, 18:45:23

Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Jun  5, 2013 at 10:12:17AM +0100, Greg Stark wrote:
>> I'm not a lawyer and I make no judgement on how solid a practice this
>> is but that's VMware doesn't seem to be doing anything special here.
>> They can retain copyright ownership of their contributions as long as
>> they're happy releasing it under the Postgres copyright. Ideally they
>> wold also be happy with a copyright notice that includes all of the
>> PGDG just to reduce the maintenance headache.

> Yes, completely true, and I was not clear on that myself either. 
> Several people pointed out similar user copyrights in our existing code,
> which I then realized were not a problem.  As long as the copyright
> details are the same as our code, anyone can hold the copyright, I
> think.

You're both being quite sloppy about the difference between "copyright"
and "license".  The point is correct though: what we care about is that
everybody releases their work under the same *license terms*.  As long
as that's the case, we don't care terribly much exactly who holds
copyright on which parts of the code.  (In this analysis, "PGDG" is
basically a shorthand for "everybody who's ever contributed anything".)

> Part of my concern was patents.  Because VMWare asserts patents on
> Postgres enhancements, when I saw VMWare copyright code, my "concern"
> antenna went up and was glad to find it had all be handled by Heikki
> already.

Yes, patents are a different and much nastier can of worms.
        regards, tom lane

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Josh Berkus

Date:

05 June 2013, 20:10:58

> I'm not a lawyer and I make no judgement on how solid a practice this
>> is but that's VMware doesn't seem to be doing anything special here.
>> They can retain copyright ownership of their contributions as long as
>> they're happy releasing it under the Postgres copyright. Ideally they
>> wold also be happy with a copyright notice that includes all of the
>> PGDG just to reduce the maintenance headache.

Many other projects also take this approach: Linux Kernel, Drizzle, etc.There's some legal advantages, as well as
disadvantages,in having the
 
copyright rest with the original contributors.  Mostly, it prevents
relicensing of the whole project.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_rewind, a tool for resynchronizing an old master after failover

From

Dave Page

Date:

05 June 2013, 20:47:33

On Wed, Jun 5, 2013 at 6:10 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> I'm not a lawyer and I make no judgement on how solid a practice this
>>> is but that's VMware doesn't seem to be doing anything special here.
>>> They can retain copyright ownership of their contributions as long as
>>> they're happy releasing it under the Postgres copyright. Ideally they
>>> wold also be happy with a copyright notice that includes all of the
>>> PGDG just to reduce the maintenance headache.
>
> Many other projects also take this approach: Linux Kernel, Drizzle, etc.
>  There's some legal advantages, as well as disadvantages, in having the
> copyright rest with the original contributors.  Mostly, it prevents
> relicensing of the whole project.

No it doesn't - it just makes it a pain in the arse (I know, I've done it).

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company