Re: pg 8.3 replication causing corruption - Mailing list pgsql-general

From Merlin Moncure
Subject Re: pg 8.3 replication causing corruption
Date
Msg-id CAHyXU0z+Sjm1kWUEPhAOKs_a9a=j47DZ7ViLEtvg=svVH4Znaw@mail.gmail.com
Whole thread Raw
In response to Re: pg 8.3 replication causing corruption  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: pg 8.3 replication causing corruption  (Bob Hatfield <bobhatfield@gmail.com>)
List pgsql-general
On Thu, Oct 13, 2011 at 4:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Oct 13, 2011 at 4:07 PM, Bob Hatfield <bobhatfield@gmail.com> wrote:
>>> have you had any power events?  hard shutdowns, etc? I wonder if the problem is in the clog files, and not the heap
itself.
>>
>> Nothing unusual for as long as I can tell.  Reminder that as long as I
>> don't restart the primary's pg process, everything works fine
>> (secondary's data is intact).
>>
>> It's as if stopping/starting the primary causes a shipped wal file to
>> be corrupt or contain duplicated data then processed by the secondary.
>
> My money is on clog/visibility  related issues.  It's a bit of a bear,
> but can you pull the xmin/xmax/ctid for the two duplicate records on
> the standby and the correspondingly non-duplicated record on the
> master?  I'm curious if the heap blocks are identical and if the
> standby is incorrectly marking a transaction as valid/invalid.
>
> From there,
>
> We need to:
> *) figure out the transaction bits in clog on both systems and look
> them up there.
> *) also, look for differences in clog generally
> *) digest the heap block containing the records to see if they are identical
> *) double check hint bits?


Any movement on this? There is considerable interest in any known
issues resolving reproducible issues with postgres replication.   Do
you happen to remember if set up the standby when the master was under
high load conditions?  Any interesting/unexplained messages in the
standby logs?

merlin

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: could not reattach to shared memory
Next
From: Alban Hertroys
Date:
Subject: Re: Slow query: select * order by XXX desc offset 10 limit 10