On Nov 18, 2013, at 12:57 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Were there any kind of patterns in the lost data? What kind of workload
> are they running? I have an idea what the issue might be...
On the P1 > S1 case, the data corrupted was data modified in the last few minutes before the switchover. I don't want
toover-analyze, but it was within the checkpoint_timeout value for that sever.
On the P2 > S2 case, it's less obvious what the pattern is, since there was no cutover.
Insufficient information on the P3 > S3 case.
Each of them is a reasonably high-volume OLTP-style workload. The P1/P2 client has a very high level of writes; the P3
moreread-heavy, but still a fair number of writes.
--
-- Christophe Pettus xof@thebuild.com