Home > mailing lists

Re: pg 8.3 replication causing corruption - Mailing list pgsql-general

From	Merlin Moncure
Subject	Re: pg 8.3 replication causing corruption
Date	October 14, 2011 16:30:03
Msg-id	CAHyXU0z+Sjm1kWUEPhAOKs_a9a=j47DZ7ViLEtvg=svVH4Znaw@mail.gmail.com Whole thread
In response to	Re: pg 8.3 replication causing corruption (Merlin Moncure <mmoncure@gmail.com>)
Responses	Re: pg 8.3 replication causing corruption
List	pgsql-general

Tree view

On Thu, Oct 13, 2011 at 4:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Oct 13, 2011 at 4:07 PM, Bob Hatfield <bobhatfield@gmail.com> wrote:
>>> have you had any power events?  hard shutdowns, etc? I wonder if the problem is in the clog files, and not the heap
itself.
>>
>> Nothing unusual for as long as I can tell.  Reminder that as long as I
>> don't restart the primary's pg process, everything works fine
>> (secondary's data is intact).
>>
>> It's as if stopping/starting the primary causes a shipped wal file to
>> be corrupt or contain duplicated data then processed by the secondary.
>
> My money is on clog/visibility  related issues.  It's a bit of a bear,
> but can you pull the xmin/xmax/ctid for the two duplicate records on
> the standby and the correspondingly non-duplicated record on the
> master?  I'm curious if the heap blocks are identical and if the
> standby is incorrectly marking a transaction as valid/invalid.
>
> From there,
>
> We need to:
> *) figure out the transaction bits in clog on both systems and look
> them up there.
> *) also, look for differences in clog generally
> *) digest the heap block containing the records to see if they are identical
> *) double check hint bits?


Any movement on this? There is considerable interest in any known
issues resolving reproducible issues with postgres replication.   Do
you happen to remember if set up the standby when the master was under
high load conditions?  Any interesting/unexplained messages in the
standby logs?

merlin

pgsql-general by date:

From: Merlin Moncure
Date: 14 October 2011, 16:27:30
Subject: Re: could not reattach to shared memory

From: Alban Hertroys
Date: 14 October 2011, 16:30:10
Subject: Re: Slow query: select * order by XXX desc offset 10 limit 10

Re: pg 8.3 replication causing corruption - Mailing list pgsql-general

Previous

Next