Re: about fsync in CLOG buffer write - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: about fsync in CLOG buffer write
Date
Msg-id CAMkU=1xURO+spuAMZHWc+OPfgxqvG7Ng235E2c8yP2ybA8XCdQ@mail.gmail.com
Whole thread Raw
In response to Re: about fsync in CLOG buffer write  (Andres Freund <andres@anarazel.de>)
Responses Re: about fsync in CLOG buffer write  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Sep 12, 2015 at 5:21 PM, Andres Freund <andres@anarazel.de> wrote:
On September 12, 2015 5:18:28 PM PDT, Jeff Janes <jeff.janes@gmail.com> wrote:
>On Wed, Sep 2, 2015 at 5:32 AM, Andres Freund <andres@anarazel.de>
>wrote:
>
>> On 2015-09-10 19:39:59 +0800, 张广舟(明虚) wrote:
>> > We found there is a fsync call when CLOG buffer
>> > is written out in SlruPhysicalWritePage(). It is often called when
>a
>> backend
>> > needs to check transaction status with SimpleLruReadPage().
>>
>> That's when there's not enough buffers available some other, and your
>> case dirty, needs to be written out.
>>
>
>Why bother to find a place to store the page in shared memory at all?
>If
>we just want to read it, and it isn't already in shared memory, then
>why
>not just ask the kernel for the specific byte we need?  The byte we
>want to
>read can't differ between shared memory and kernel, because it doesn't
>exist in shared memory.

I doubt that'd help - the next access would be more expensive, and we'd need to have a more complex locking regime. These pages aren't necessarily read only at that point.

My (naive) expectation is that no additional locking is needed.  

Once we decide to consult the clog, we already know the transaction is no longer in progress, so it can't be in-flight to change that clog entry we care about because it was required to have done that already.

Once we have verified (under existing locking) that the relevant page is already not in memory, we know it can't be dirty in memory.  If someone pulls it into memory after we observe it to be not there, it doesn't matter to us as whatever transaction they are about to change can't be the one we care about.

Perhaps someone will want the same page later so that they can write to it and so will have to pull it in.  But we have to play the odds, and the odds are that a page already dirty in memory is more likely to be needed to be written to in the near future, than another page which was not already dirty and is only needed with read intent.

If we are wrong, all that happens is someone later on has to do the same work that we would have had to do anyway, at no greater cost than we if did it now.  If we are right, we avoid an fsync to make room for new page, and then later on avoid someone else having to shove out the page we brought in (or a different one) only to replace it with the same page we just wrote, fsynced, and shoved out.

Is there a chance that, if we read a byte from the kernel when someone is in the process of writing adjacent bytes (or writing the same byte, with changes only to bits in it which we don't care about), the kernel will deliver us something which is neither the old value nor the new value, but some monstrosity?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: DBT-3 with SF=20 got failed
Next
From: Andres Freund
Date:
Subject: Re: about fsync in CLOG buffer write