Re: Inconsistent DB data in Streaming Replication - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Inconsistent DB data in Streaming Replication |
Date | |
Msg-id | 5166C9A3.4020105@2ndQuadrant.com Whole thread Raw |
In response to | Re: Inconsistent DB data in Streaming Replication (Ants Aasma <ants@cybertec.at>) |
Responses |
Re: Inconsistent DB data in Streaming Replication
|
List | pgsql-hackers |
On 04/11/2013 03:52 PM, Ants Aasma wrote: > On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> The proposed fix - halting all writes of data pages to disk and >> to WAL files while waiting ACK from standby - will tremendously >> slow down all parallel work on master. > This is not what is being proposed. The proposed fix halts writes of > only data pages that are modified within the window of WAL that is not > yet ACKed by the slave. This means pages that were recently modified > and where the clocksweep or checkpoint has decided to evict them. This > only affects the checkpointer, bgwriter and backends doing allocation. > Furthermore, for the backend clocksweep case it would be reasonable to > just pick another buffer to evict. The slowdown for most actual cases > will be negligible. You also need to hold back all WAL writes, including the ones by parallel async and locally-synced transactions. Which means that you have to make all locally synced transactions to wait on the syncrep transactions committed before them. After getting the ACK from slave you then have a backlog of stuff to write locally, which then also needs to be sent to slave. Basically this turns a nice smooth WAL write-and-stream pipeline into a chunky wait-and-write-and-wait-and-stream-and-wait :P This may not be a problem in slight write load cases, which is probably the most widely happening usecase for postgres, but it will harm top performance and also force people to get much better (and more expensive) hardware than would otherways be needed. > >> And it does just turn around "master is ahead of slave" problem >> into "slave is ahead of master" problem :) > The issue is not being ahead or behind. The issue is ensuring WAL > durability in the face of failovers before modifying data pages. This > is sufficient to guarantee no forks in the WAL stream from the point > of view of data files and with that the capability to always recover > by replaying WAL. How would this handle the case Tom pointed out, namely a short power recycling on master ? Instead of just continuing after booting up again the master now has to figure out if it had any slaves and then try to query them (for how long?) if they had any replayed WAL the master does not know of. Suddenly the pure existence of streaming replica slaves has become a problem for master ! This will especially complicate the case of multiple slaves each having received WAL to a slightly different LSN ? And you do want to have at least 2 slaves if you want both durability and availability with syncrep. What if the one of slaves disconnects ? how should master react to this ? Regards Hannu Krosing
pgsql-hackers by date: