Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date
Msg-id 528F4CA0.30408@vmware.com
Whole thread Raw
In response to Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 19.11.2013 16:20, Andres Freund wrote:
> On 2013-11-18 23:15:59 +0100, Andres Freund wrote:
>> Afaics it's likely a combination/interaction of bugs and fixes between:
>> * the initial HS code
>> * 5a031a5556ff83b8a9646892715d7fef415b83c3
>> * f44eedc3f0f347a856eea8590730769125964597
>
> Yes, the combination of those is guilty.
>
> Man, this is (to a good part my) bad.
>
>> But that'd mean nobody noticed it during 9.3's beta...
>
> It's fairly hard to reproduce artificially since a) there have to be
> enough transactions starting and committing from the start of the
> checkpoint the standby is starting from to the point it does
> LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save
> the game by not accessing clog at all anymore and thus not noticing the
> corruption.
> I've reproduced the issue by having an INSERT ONLY table that's never
> read from. It's helpful to disable autovacuum.

For the archive, here's what I used to reproduce this. It creates master
and a standby, and also uses an INSERT only table. To make it trigger
more easily, it helps to insert sleeps in CreateCheckpoint(), around the
LogStandbySnapshot() call.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Marko Kreen
Date:
Subject: Re: PL/Python: domain over array support
Next
From: Fabrízio de Royes Mello
Date:
Subject: Re: [PATCH] Store Extension Options