On Thu, Oct 31, 2019 at 09:43:47AM +0100, Antonin Houska wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Isn't this prevented by locking of the buffer header? Both FlushBuffer
>> and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
>> does a bit of work before, but that's related to BM_PERMANENT.
In FlushBuffer() you have a window after fetching the buffer state and
the header is once unlocked until TerminateBufferIO() gets called
(this would take again a lock on the buffer header), so it is
logically possible for the buffer to be marked as dirty once again
causing the update of the LSN on the page to be missed even if a
backup block has been written in WAL.
> Yes, I had verified it using gdb: inserted a row into a table, then attached
> gdb to checkpointer and stopped it when FlushBuffer() was about to call
> TerminateBufferIO(). Then, when scanning the table, I saw that
> MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
> checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
> 3553624066 ~ 0b11010011110100000000000000000010.
Small typo here: 11010011110100000000000000000010...
> With BM_DIRTY defined as
>
> #define BM_DIRTY (1U << 23)
>
> the flag appeared to be set.
... Still, the bit is set here.
Does something like the attached patch make sense? Reviews are
welcome.
--
Michael