Thread: MarkBufferDirtyHint() and LSN update

MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Please consider this scenario (race conditions):

1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).

2. Another backend modified a hint bit and called MarkBufferDirtyHint().

3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
(e.g. due to checksums enabled), new LSN is computed, however it's not
assigned to the page because the buffer is still dirty:

    if (!(buf_state & BM_DIRTY))
    {
        ...

        if (!XLogRecPtrIsInvalid(lsn))
            PageSetLSN(page, lsn);
    }

4. MarkBufferDirtyHint() completes.

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
subtle detail?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Tomas Vondra
Date:
On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote:
>Please consider this scenario (race conditions):
>
>1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
>BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).
>
>2. Another backend modified a hint bit and called MarkBufferDirtyHint().
>
>3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
>(e.g. due to checksums enabled), new LSN is computed, however it's not
>assigned to the page because the buffer is still dirty:
>
>    if (!(buf_state & BM_DIRTY))
>    {
>        ...
>
>        if (!XLogRecPtrIsInvalid(lsn))
>            PageSetLSN(page, lsn);
>    }
>
>4. MarkBufferDirtyHint() completes.
>
>5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
>BM_DIRTY because MarkBufferDirtyHint() has eventually set
>BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
>call of FlushBuffer(). However page LSN is hasn't been updated so the
>requirement that WAL must be flushed first is not met.
>
>I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
>subtle detail?
>

Isn't this prevented by locking of the buffer header? Both FlushBuffer
and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
does a bit of work before, but that's related to BM_PERMANENT.

If there really is a race condition, it shouldn't be that difficult to
trigger it by adding a sleep or a breakpoint, I guess.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote:
> >Please consider this scenario (race conditions):
> >
> >1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
> >BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).
> >
> >2. Another backend modified a hint bit and called MarkBufferDirtyHint().
> >
> >3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
> >(e.g. due to checksums enabled), new LSN is computed, however it's not
> >assigned to the page because the buffer is still dirty:
> >
> >    if (!(buf_state & BM_DIRTY))
> >    {
> >        ...
> >
> >        if (!XLogRecPtrIsInvalid(lsn))
> >            PageSetLSN(page, lsn);
> >    }
> >
> >4. MarkBufferDirtyHint() completes.
> >
> >5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> >BM_DIRTY because MarkBufferDirtyHint() has eventually set
> >BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> >call of FlushBuffer(). However page LSN is hasn't been updated so the
> >requirement that WAL must be flushed first is not met.
> >
> >I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
> >subtle detail?
> >
>
> Isn't this prevented by locking of the buffer header? Both FlushBuffer
> and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
> does a bit of work before, but that's related to BM_PERMANENT.
>
> If there really is a race condition, it shouldn't be that difficult to
> trigger it by adding a sleep or a breakpoint, I guess.

Yes, I had verified it using gdb: inserted a row into a table, then attached
gdb to checkpointer and stopped it when FlushBuffer() was about to call
TerminateBufferIO(). Then, when scanning the table, I saw that
MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
3553624066 ~ 0b11010011110100000000000000000010.

With BM_DIRTY defined as

    #define BM_DIRTY                (1U << 23)

the flag appeared to be set.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Robert Haas
Date:
On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska <ah@cybertec.at> wrote:
> 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> BM_DIRTY because MarkBufferDirtyHint() has eventually set
> BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> call of FlushBuffer(). However page LSN is hasn't been updated so the
> requirement that WAL must be flushed first is not met.

This part confuses me. Are you saying that MarkBufferDirtyHint() can
set BM_JUST_DIRTIED without setting BM_DIRTY?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska <ah@cybertec.at> wrote:
> > 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> > BM_DIRTY because MarkBufferDirtyHint() has eventually set
> > BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> > call of FlushBuffer(). However page LSN is hasn't been updated so the
> > requirement that WAL must be flushed first is not met.
>
> This part confuses me. Are you saying that MarkBufferDirtyHint() can
> set BM_JUST_DIRTIED without setting BM_DIRTY?

No, I'm saying that MarkBufferDirtyHint() leaves BM_DIRTY set, as
expected. However, if things happen in the order I described, then LSN
returned by XLogSaveBufferForHint() is not assigned to the page.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Michael Paquier
Date:
On Thu, Oct 31, 2019 at 09:43:47AM +0100, Antonin Houska wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Isn't this prevented by locking of the buffer header? Both FlushBuffer
>> and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
>> does a bit of work before, but that's related to BM_PERMANENT.

In FlushBuffer() you have a window after fetching the buffer state and
the header is once unlocked until TerminateBufferIO() gets called
(this would take again a lock on the buffer header), so it is
logically possible for the buffer to be marked as dirty once again
causing the update of the LSN on the page to be missed even if a
backup block has been written in WAL.

> Yes, I had verified it using gdb: inserted a row into a table, then attached
> gdb to checkpointer and stopped it when FlushBuffer() was about to call
> TerminateBufferIO().  Then, when scanning the table, I saw that
> MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
> checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
> 3553624066 ~ 0b11010011110100000000000000000010.

Small typo here: 11010011110100000000000000000010...

> With BM_DIRTY defined as
>
>     #define BM_DIRTY                (1U << 23)
>
> the flag appeared to be set.

...  Still, the bit is set here.

Does something like the attached patch make sense?  Reviews are
welcome.
--
Michael

Attachment

Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Michael Paquier <michael@paquier.xyz> wrote:
> Does something like the attached patch make sense?  Reviews are
> welcome.

This looks good to me.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Kyotaro Horiguchi
Date:
At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska <ah@cybertec.at> wrote in 
> Michael Paquier <michael@paquier.xyz> wrote:
> > Does something like the attached patch make sense?  Reviews are
> > welcome.
> 
> This looks good to me.

I have a qustion.

The current code assumes that !BM_DIRTY means that the function is
dirtying the page.  But if !BM_JUST_DIRTIED, the function actually is
going to re-dirty the page even if BM_DIRTY.

If this is correct, the trigger for stats update is not !BM_DIRTY but
!BM_JUST_DIRTIED, or the fact that we passed the line of
XLogSaveBufferForHint() could be the trigger, regardless whether the
LSN is valid or not.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska <ah@cybertec.at> wrote in
> > Michael Paquier <michael@paquier.xyz> wrote:
> > > Does something like the attached patch make sense?  Reviews are
> > > welcome.
> >
> > This looks good to me.
>
> I have a qustion.
>
> The current code assumes that !BM_DIRTY means that the function is
> dirtying the page.  But if !BM_JUST_DIRTIED, the function actually is
> going to re-dirty the page even if BM_DIRTY.

It makes sense to me. I can imagine the following:

1. FlushBuffer() cleared BM_JUST_DIRTIED, wrote the page to disk but hasn't
yet cleared BM_DIRTY.

2. Another backend changed a hint bit in shared memory and called
MarkBufferDirtyHint().

Thus FlushBuffer() missed the current hint bit change, so we need to dirty the
page again.

> If this is correct, the trigger for stats update is not !BM_DIRTY but
> !BM_JUST_DIRTIED, or the fact that we passed the line of
> XLogSaveBufferForHint() could be the trigger, regardless whether the
> LSN is valid or not.

I agree.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Michael Paquier
Date:
On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:
> This looks good to me.

Actually, no, this is not good.  I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100.  The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.
--
Michael

Attachment

Re: MarkBufferDirtyHint() and LSN update

From
Michael Paquier
Date:
On Wed, Nov 13, 2019 at 09:17:03PM +0900, Michael Paquier wrote:
> Actually, no, this is not good.  I have been studying more the patch,
> and after stressing more this code path with a cluster having
> checksums enabled and shared_buffers at 1MB, I have been able to make
> a couple of page's LSNs go backwards with pgbench -s 100.  The cause
> was simply that the page got flushed with a newer LSN than what was
> returned by XLogSaveBufferForHint() before taking the buffer header
> lock, so updating only the LSN for a non-dirty page was simply
> guarding against that.

for the reference attached is the trick I have used, adding an extra
assertion check in PageSetLSN().
--
Michael

Attachment

Re: MarkBufferDirtyHint() and LSN update

From
Kyotaro Horiguchi
Date:
At Thu, 14 Nov 2019 12:01:29 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Wed, Nov 13, 2019 at 09:17:03PM +0900, Michael Paquier wrote:
> > Actually, no, this is not good.  I have been studying more the patch,
> > and after stressing more this code path with a cluster having
> > checksums enabled and shared_buffers at 1MB, I have been able to make
> > a couple of page's LSNs go backwards with pgbench -s 100.  The cause
> > was simply that the page got flushed with a newer LSN than what was
> > returned by XLogSaveBufferForHint() before taking the buffer header
> > lock, so updating only the LSN for a non-dirty page was simply
> > guarding against that.

I thought of something like that but forgot to mention. But after
thinking of that, couldn't the current code can do the same think even
though with a far small probability? Still a session with smaller hint
LSN but didn't entered the header lock section yet can be cut-in by
another session with larger hint LSN.

> for the reference attached is the trick I have used, adding an extra
> assertion check in PageSetLSN(). 

I believe that all locations where Page-LSN is set is in the same
buffer-ex-lock section with XLogInsert.. but not sure.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:
> > This looks good to me.
> 
> Actually, no, this is not good.  I have been studying more the patch,
> and after stressing more this code path with a cluster having
> checksums enabled and shared_buffers at 1MB, I have been able to make
> a couple of page's LSNs go backwards with pgbench -s 100.  The cause
> was simply that the page got flushed with a newer LSN than what was
> returned by XLogSaveBufferForHint() before taking the buffer header
> lock, so updating only the LSN for a non-dirty page was simply
> guarding against that.

Interesting. Now that I know about the problem, I could have reproduced it
using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
a way that the first backend generates the LSN, but before it manages to
assign it to the page, another backend generates another LSN and sets it.

Can't we just apply the attached diff on the top of your patch?

Also I wonder how checksums helped you to discover the problem? Although I
could simulate the setting of lower LSN, I could not see any broken
checksum. And I wouldn't even expect that since FlushBuffer() copies the
buffer into backend local memory before it calculates the checksum.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: MarkBufferDirtyHint() and LSN update

From
Antonin Houska
Date:
Antonin Houska <ah@cybertec.at> wrote:

> Michael Paquier <michael@paquier.xyz> wrote:
>
> > On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:
> > > This looks good to me.
> >
> > Actually, no, this is not good.  I have been studying more the patch,
> > and after stressing more this code path with a cluster having
> > checksums enabled and shared_buffers at 1MB, I have been able to make
> > a couple of page's LSNs go backwards with pgbench -s 100.  The cause
> > was simply that the page got flushed with a newer LSN than what was
> > returned by XLogSaveBufferForHint() before taking the buffer header
> > lock, so updating only the LSN for a non-dirty page was simply
> > guarding against that.
>
> Interesting. Now that I know about the problem, I could have reproduced it
> using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
> a way that the first backend generates the LSN, but before it manages to
> assign it to the page, another backend generates another LSN and sets it.
>
> Can't we just apply the attached diff on the top of your patch?

I wanted to register the patch for the next CF so it's not forgotten, but see
it's already there. Why have you set the status to "withdrawn"?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: MarkBufferDirtyHint() and LSN update

From
Michael Paquier
Date:
On Fri, Dec 20, 2019 at 04:30:38PM +0100, Antonin Houska wrote:
> I wanted to register the patch for the next CF so it's not forgotten, but see
> it's already there. Why have you set the status to "withdrawn"?

Because my patch was incorrect, and I did not make enough bandwidth to
think more on the matter.  I am actually not sure that what you are
proposing is better..  If you wish to get that reviewed, could you add
a new CF entry?
--
Michael

Attachment