Thread: Reliability of WAL replication
We had some corrupted data files in the past (missing clog, see http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are thinking about setting up a warm standby system using WAL replication. Would an error like the one we had appear in WAL and would it be replicated too? Or is there some kind of consistency check, that prevents broken WAL from being restored? I've already posted this question to the bugs list two weeks ago, but didn't receive an answer so far. Maybe it was the wrong list for that kind of question, so we'll give it another try here. Our customer demands a final statement from us, so we would appreciate a soon reply ( I know, it's always urgent, isn't it? ;) ). Regards, Marc Schablewski click:ware Informationstechnik GmbH
Marc, On Tue, 2007-10-23 at 13:58 +0200, Marc Schablewski wrote: > We had some corrupted data files in the past (missing clog, see > http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are > thinking about setting up a warm standby system using WAL replication. > > Would an error like the one we had appear in WAL and would it be > replicated too? Or is there some kind of consistency check, that > prevents broken WAL from being restored? Here we had WAL based replication in place some time ago, and the result are somewhat mixed: in one case the corruption was replicated, other times it was not... I guess it has to do with where the corruption occurred, and I have a feeling the first case (corruption replicated) was some postgres corner case reacting badly on kill -9 and alike, the second case (corruption not replicated) was file system corruption. I didn't run WAL based replication for a while, so I don't know what have changed in it lately... Cheers, Csaba.
On Tue, 2007-10-23 at 13:58 +0200, Marc Schablewski wrote: > We had some corrupted data files in the past (missing clog, see > http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are > thinking about setting up a warm standby system using WAL replication. > > Would an error like the one we had appear in WAL and would it be > replicated too? Or is there some kind of consistency check, that > prevents broken WAL from being restored? Each WAL record is CRC checked, so it is quite unlikely that it could be corrupt on its own. The contents of the WAL record may cause the system to do something wrong on the second server, but if this occurs it usually causes some form of error and we can see that this has happened, report the bug and then restart replication. If that kind of error occurs it is because of a problem in the PostgreSQL software, not a fault of the replication technique. That means these incidents are very rare and we have quickly fixed such bugs when they do occur. I think this has happened twice in 12-18 months. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com