Re: Inadequate thought about buffer locking during hot standby replay - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Inadequate thought about buffer locking during hot standby replay
Date
Msg-id 13637.1352731890@sss.pgh.pa.us
Whole thread Raw
In response to Re: Inadequate thought about buffer locking during hot standby replay  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Inadequate thought about buffer locking during hot standby replay
Re: Inadequate thought about buffer locking during hot standby replay
List pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 12.11.2012 01:25, Tom Lane wrote:
>> Worse than that, GIST WAL replay seems to be a total disaster from this
>> standpoint, and I'm not convinced that it's even fixable without
>> incompatible WAL changes.

> Hmm. After the rewrite of that logic I did in 9.1 (commit 9de3aa65f0), 
> the GiST index is always in a consistent state. Before the downlink for 
> a newly split page has been inserted yet, its left sibling is flagged so 
> that a search knows to follow the right-link to find it. In normal 
> operation, the lock continuity is needed to prevent another backend from 
> seeing the incomplete split and trying to fix it, but in hot standby, we 
> never try to fix incomplete splits anyway.

> So I think we're good on >= 9.1.

Okay.  I'll update the patch to make sure that the individual WAL replay
functions hold all locks, but not worry about holding locks across
actions.

> The 9.0 code is broken, however. In 
> 9.0, when a child page is split, the parent and new children are kept 
> locked until the downlinks are inserted/updated. If a concurrent scan 
> comes along and sees that incomplete state, it will miss tuples on the 
> new right siblings. We rely on a rm_cleanup operation at the end of WAL 
> replay to fix that situation, if the downlink insertion record is not 
> there. I don't see any easy way to fix that, unfortunately. Perhaps we 
> could backpatch the 9.1 rewrite, now that it's gotten some real-world 
> testing, but it was a big change so I don't feel very comfortable doing 
> that.

Me either.  Given the lack of field complaints, I think we're better
advised to just leave it unfixed in 9.0.  It'd not be a step forward
if we broke something trying to make this work.

>> Also, gistRedoPageDeleteRecord seems to be dead code?  I don't see
>> anything that emits XLOG_GIST_PAGE_DELETE.

> Yep. It was used in VACUUM FULL, but has been dead since VACUUM FULL was 
> removed.

Okay.  I see we bumped XLOG_PAGE_MAGIC in 9.0, so there's no longer
any way that 9.0 or later versions could see this WAL record type.
I'll delete that replay function rather than bothering to fix it.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inadequate thought about buffer locking during hot standby replay
Next
From: Fujii Masao
Date:
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown