Thread: Reliability of WAL replication

Reliability of WAL replication

From
Marc Schablewski
Date:
We had some corrupted data files in the past (missing clog, see
http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are
thinking about setting up a warm standby system using WAL replication.

Would an error like the one we had appear in WAL and would it be
replicated too? Or is there some kind of consistency check, that
prevents broken WAL from being restored?

I've already posted this question to the bugs list two weeks ago, but
didn't receive an answer so far. Maybe it was the wrong list for that
kind of question, so we'll give it another try here.

Our customer demands a final statement from us, so we would appreciate a
soon reply ( I know, it's always urgent, isn't it? ;) ).

Regards,

Marc Schablewski
click:ware Informationstechnik GmbH


Re: Reliability of WAL replication

From
Csaba Nagy
Date:
Marc,

On Tue, 2007-10-23 at 13:58 +0200, Marc Schablewski wrote:
> We had some corrupted data files in the past (missing clog, see
> http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are
> thinking about setting up a warm standby system using WAL replication.
>
> Would an error like the one we had appear in WAL and would it be
> replicated too? Or is there some kind of consistency check, that
> prevents broken WAL from being restored?

Here we had WAL based replication in place some time ago, and the result
are somewhat mixed: in one case the corruption was replicated, other
times it was not... I guess it has to do with where the corruption
occurred, and I have a feeling the first case (corruption replicated)
was some postgres corner case reacting badly on kill -9 and alike, the
second case (corruption not replicated) was file system corruption. I
didn't run WAL based replication for a while, so I don't know what have
changed in it lately...

Cheers,
Csaba.



Re: Reliability of WAL replication

From
Simon Riggs
Date:
On Tue, 2007-10-23 at 13:58 +0200, Marc Schablewski wrote:
> We had some corrupted data files in the past (missing clog, see
> http://archives.postgresql.org/pgsql-bugs/2007-07/msg00124.php) and are
> thinking about setting up a warm standby system using WAL replication.
>
> Would an error like the one we had appear in WAL and would it be
> replicated too? Or is there some kind of consistency check, that
> prevents broken WAL from being restored?

Each WAL record is CRC checked, so it is quite unlikely that it could be
corrupt on its own.

The contents of the WAL record may cause the system to do something
wrong on the second server, but if this occurs it usually causes some
form of error and we can see that this has happened, report the bug and
then restart replication. If that kind of error occurs it is because of
a problem in the PostgreSQL software, not a fault of the replication
technique. That means these incidents are very rare and we have quickly
fixed such bugs when they do occur. I think this has happened twice in
12-18 months.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com