Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date
Msg-id 528C9392.8000004@vmware.com
Whole thread Raw
In response to Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 19.11.2013 16:22, Andres Freund wrote:
> On 2013-11-19 15:20:01 +0100, Andres Freund wrote:
>> Imo something the attached patch should be done. The description I came
>> up with is:
>>
>>      Fix Hot-Standby initialization of clog and subtrans.

Looks ok for a back-patchable fix.

It's a bit bizarre that the ExtendSUBTRANS loop in 
ProcArrayApplyRecoveryInfo looks different from the one in 
RecordKnownAssignedTransactionIds, but both look correct to me.

In master, it'd be nice to do some further cleanup. Some gripes:

In ProcArrayApplyXidAssignment, I wonder if it would be best to just 
remove the (standbyState == STANDBY_INITIALIZED) check altogether. The 
KnownAssignedXidsRemoveTree() that follows is harmless if there is 
nothing in the known-assigned-xids array (xact_redo_commit does it in 
STANDBY_INITIALIZED state too). The other thing that's done after that 
check is updating lastOverflowedXid, and AFAICS it would be harmless to 
update that, too. It will be overwritten by the 
ProcArrayApplyRecoveryInfo() call that comes later.

Clog, subtrans and multixact are all handled differently. Extensions of 
clog and multixact are WAL-logged, but extensions of subtrans are not. 
They all have a Startup function, but it has a slightly different 
purpose. StartupCLOG only sets latest_page_number, but StartupSUBTRANS 
and StartupMultiXact zero out the current page. For CLOG, the TrimCLOG() 
function does that. Why is clog handled differently from multixact?

StartupCLOG() and StartupMultiXact set latest_page_number, but 
StartupSUBTRANS does not. Is that a problem for subtrans? StartupCLOG() 
and StartupMultiXact() are called at different stages in hot standby - 
StartupCLOG() is called at the beginning of recovery, but 
StartupMultiXact() is only called at end of recovery. Why the 
discrepancy, does latest_page_number need to be set during hot standby 
or not?

I think we should bite the bullet, and WAL-log the extension of 
subtrans, too. Then make the startup and extension procedure for all the 
SLRUs the same.

- Heikki



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: VACUUM for TOASTed objects
Next
From: Haribabu kommi
Date:
Subject: Re: New option for pg_basebackup, to specify a different directory for pg_xlog