Re: Inconsistent DB data in Streaming Replication - Mailing list pgsql-hackers

From Samrat Revagade
Subject Re: Inconsistent DB data in Streaming Replication
Date
Msg-id CAF8Q-GwH0N7yFUT+QophzsC5z7+7KxRjWPdTUASGzvaO2rgyxw@mail.gmail.com
Whole thread Raw
In response to Re: Inconsistent DB data in Streaming Replication  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Inconsistent DB data in Streaming Replication
List pgsql-hackers

>What Samrat is proposing here is that WAL is not flushed to the OS before

>it is acked by a synchronous replica so recovery won't go past the

>timeline change made in failover, making it necessary to take a new

>base backup to resync with the new master.

Actually we are proposing that the data page on the master is not committed till master receives ACK from the standby. The WAL files can be flushed to the disk on both the master and standby, before standby generates ACK to master. The end objective is the same of avoiding to take base backup of old master to resync with new master.

>Why do you think that the inconsistent data after failover happens is
>problem? Because

>it's one of the reasons why a fresh base backup is required when
>starting old master as
>new standby? If yes, I agree with you. I've often heard the complaints
>about a backup
>when restarting new standby. That's really big problem.

 Yes, taking backup is  major problem when the database size is more than several TB. It would take very long time to ship backup data over the slow WAN network. 

>> One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master.

>You mean to make the master wait the data page write until WAL has been not only
>flushed to disk but also replicated to the standby?

 Yes. Master should not write the data page before corresponding WAL records have been replicated to the standby. The WAL records have been flushed to disk on both master and standby.

>> The main drawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to client. Are there any other issues with this approach?

>I think that you can introduce GUC specifying whether this extra check
>is required to avoid a backup when failback

That would be better idea. We can disable it whenever taking a fresh backup is not a problem.      


Regards,

Samrat  



On Mon, Apr 8, 2013 at 10:40 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade
<revagade.samrat@gmail.com> wrote:
>
> Hello,
>
> We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario:
>
> If master receives commit command, it writes and flushes commit WAL records to the disk, It also writes and flushes data page related to this transaction.
>
> The master then sends WAL records to standby up to the commit WAL record. But before sending these records if failover happens then,  old master is ahead of  standby which is now the new master in terms of DB data leading to inconsistent data .

Why do you think that the inconsistent data after failover happens is
problem? Because
it's one of the reasons why a fresh base backup is required when
starting old master as
new standby? If yes, I agree with you. I've often heard the complaints
about a backup
when restarting new standby. That's really big problem.

The timeline mismatch after failover was one of the reasons why a
backup is required.
But, thanks to Heikki's recent work, that's solved, i.e., the timeline
mismatch would be
automatically resolved when starting replication in 9.3. So, the
remaining problem is an
inconsistent database.

> One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master.

You mean to make the master wait the data page write until WAL has been not only
flushed to disk but also replicated to the standby?

> The main drawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to client. Are there any other issues with this approach?

I think that you can introduce GUC specifying whether this extra check
is required to
avoid a backup when failback.

Regards,

--
Fujii Masao

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Unrecognized type error (postgres 9.1.4)
Next
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums