Josh Berkus <josh@agliodbs.com> wrote:
>> One thing I wanted to mention is that non-binary replication has
>> an added advantage over binary from a DR standpoint: if
>> corruption occurs on a master it is more likely to make it into
>> your replicas thanks to full page writes. You might want to
>> consider that depending on how sensitive your data is.
>
> Yeah, we've seen this a few times. We just recently had to rescue
> a client from HS-wide corruption using Slony.
That's an interesting point. Out of curiosity, how did the
corruption originate?
It suggests a couple questions:
(1) Was Slony running before the corruption occurred? If not, how
was Slony helpful? I know that in our environment, where we have
both going through separate streams, with a repository of the
logical transactions, we would use PITR recovery to get to the
latest known good state which we could easily identify, and then
replay the logical transactions to "top it off" to get current. If
necessary we could skip logical transactions which were problematic
results of the corruption.
(2) If logical transactions had been implemented as additions to
the WAL stream, and Slony was using that, do you think they would
still have been usable for this recovery?
Perhaps sending both physical and logical transaction streams over
the WAN isn't such a bad thing, if it gives us more independent
recovery mechanisms. That's fewer copies than we're sending with
current trigger-based techniques. It would be particularly
attractive is we could omit (filter out) certain tables before going
across the WAN. I would be willing to risk sending the big
raster-scanned documents through just the physical channel so long
as I had a nightly compare of md5sum values on both sides so we can
resend any corrupted data (or tell people to rescan).
-Kevin