Re: Race condition between hot standby and restoring a FPW - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Race condition between hot standby and restoring a FPW
Date
Msg-id 54637938.2020003@vmware.com
Whole thread Raw
In response to Re: Race condition between hot standby and restoring a FPW  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition between hot standby and restoring a FPW
List pgsql-hackers
On 11/12/2014 04:56 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> There's a race condition between a backend running queries in hot
>> standby mode, and restoring a full-page image from a WAL record. It's
>> present in all supported versions.
>
>> I can think of two ways to fix this:
>
>> 1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before
>> returning it. That makes the API inconsistent, as the function would
>> sometimes lock the page, and sometimes not.
>
> Ugh.
>
>> 2. When ReadBufferExtended doesn't find the page in cache, it returns
>> the buffer in !BM_VALID state (i.e. still in I/O in-progress state).
>> Require the caller to call a second function, after locking the page, to
>> finish the I/O.
>
> Not great either.  What about an RBM_NOERROR mode that is like RBM_ZERO
> in terms of handling error conditions, but does not forcibly zero the page
> if it's already valid?

Isn't that exactly what RBM_ZERO_ONERROR does?

Anyway, you don't want to read the page from disk, just to check if it's 
already valid. We stopped doing that in 8.2 (commit 
8c3cc86e7b688b0efe5ec6ce4f4342c2883b1db5), and it gave a big speedup to 
recovery.

(Note that when the page is already in the buffer-cache, RBM_ZERO 
already doesn't zero the page. So this race condition only happens when 
the page isn't in the buffer cache yet).

- Heikki




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Andres Freund
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes