Andres Freund <andres@anarazel.de> writes:
> On 2012-11-09 18:24:25 -0500, Tom Lane wrote:
>> I'm inclined to think that we need to fix this by getting rid of
>> RestoreBkpBlocks per se, and instead having the per-WAL-record restore
>> routines dictate when each full-page image is restored (and whether or
>> not to release the buffer lock immediately). That's not going to be a
>> small change unfortunately :-(
> I wonder if we couldn't instead "fix" it by ensuring the backup blocks
> are in the right order in the backup blocks at the inserting
> location. That would "just" need some care about the order of
> XLogRecData blocks.
I don't think that's a good way to go. In the first place, if we did
that the fix would require incompatible changes in the contents of WAL
streams. In the second place, there are already severe constraints on
the positioning of backup blocks to ensure that WAL records can be
uniquely decoded (the section of access/transam/README about WAL coding
touches on this) --- I don't think it's a good plan to add still more
constraints there. And in the third place, the specific problem we're
positing here results from a failure to hold the buffer lock for a
full-page image until after we're done restoring a *non* full-page image
represented elsewhere in the same WAL record. In general, of the set
of pages touched by a WAL record, any arbitrary subset of them might be
converted to FPIs during XLogInsert; but the replay-time locking
requirements are going to be the same regardless of that. So AFAICS,
any design in which RestoreBkpBlocks acts independently of the
non-full-page-image updates is just broken.
regards, tom lane