Thread: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

From
Andres Freund
Date:
Do all-visible handling in lazy_vacuum_page() outside its critical section.

Since fdf9e21196a lazy_vacuum_page() rechecks the all-visible status
of pages in the second pass over the heap. It does so inside a
critical section, but both visibilitymap_test() and
heap_page_is_all_visible() perform operations that should not happen
inside one. The former potentially performs IO and both potentially do
memory allocations.

To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.

Backpatch to 9.3 where the problem was introduced. The bug only came
to light due to the assertion added in 4a170ee9 and isn't likely to
cause problems in production scenarios. The worst outcome is a
avoidable PANIC restart.

This also gets rid of the difference in the order of operations
between master and standby mentioned in 2a8e1ac5.

Per reports from David Leverton and Keith Fiske in bug #10533.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/ecac0e2b9e8e8e78d771b20fe441e95bb02db2fa

Modified Files
--------------
src/backend/commands/vacuumlazy.c |   26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)


On Fri, 2014-06-20 at 09:10 +0000, Andres Freund wrote:
> To fix, simply move all the all-visible handling outside the critical
> section. Doing so means that the PD_ALL_VISIBLE on the page won't be
> included in the full page image of the HEAP2_CLEAN record anymore. But
> that's fine, the flag will be set by the HEAP2_VISIBLE logged later.

Trying to follow along. I read the discussion about bug #10533.

Before 2a8e1ac5, a block of actions -- visibility test, VM test, set VM
bit, set all-visible -- were inside the critical section and after the
WAL insert.

2a8e1ac5 left the block of actions in the critical section but moved
them before the WAL insert. The commit message seems to indicate that
there's a real problem setting the all-visible flag after the WAL
insert, but I couldn't identify a serious problem there.

This patch moves the block of actions back after the WAL insert, but
also puts them outside of the critical section.

If I understand correctly, you essentially reverted 2a8e1ac5 (which
implies that you think it didn't fix a real problem) and then fixed a
real problem that was missed by 2a8e1ac5. Is that correct?

> Backpatch to 9.3 where the problem was introduced. The bug only came
> to light due to the assertion added in 4a170ee9 and isn't likely to
> cause problems in production scenarios. The worst outcome is a
> avoidable PANIC restart.
>
> This also gets rid of the difference in the order of operations
> between master and standby mentioned in 2a8e1ac5.

Can you explain? It looks like the master will still see the bit right
away after the HEAP2_CLEAN record, but there is nothing in the
HEAP2_CLEAN record to tell the standby to set all-visible. The standby
has to wait until the HEAP2_VISIBLE record comes along.

I'm a little concerned that we're approaching the WAL rules in an ad-hoc
manner. I don't see an actual problem right now, but we've been fixing
problems with PD_ALL_VISIBLE since the VM was made crash safe almost
exactly three years ago (503c7305). Do we finally feel confident in the
design and its implications?

With checksums, Simon and I tried to create some new abstractions, like
MarkBufferDirtyHint, and document them in transam/README. I hope that
gives us some greater confidence in the checksums code[1], because we
can see a lot of the tricky aspects in one place, along with the rules
that callers should follow.

A quick look at the comment above the function tells us that
MarkBufferDirtyHint won't work for the all-visible bit, but perhaps an
adapted version could work. If so, we could separate the all-visible bit
from the HEAP2_CLEAN action, and it would look a lot more like setting a
tuple hint. Also, we might be able to simplify visibilitymap_set if the
logging of the heap page for checksums could be done beforehand with
this variant of MarkBufferDirtyHint.

Regards,
    Jeff Davis

[1] Not that we didn't make mistakes; at least one of which you
heroically diagnosed just in time.




Re: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

From
Andres Freund
Date:
Hi,

On 2014-06-23 13:00:11 -0700, Jeff Davis wrote:
> On Fri, 2014-06-20 at 09:10 +0000, Andres Freund wrote:
> > To fix, simply move all the all-visible handling outside the critical
> > section. Doing so means that the PD_ALL_VISIBLE on the page won't be
> > included in the full page image of the HEAP2_CLEAN record anymore. But
> > that's fine, the flag will be set by the HEAP2_VISIBLE logged later.
>
> Trying to follow along. I read the discussion about bug #10533.
>
> Before 2a8e1ac5, a block of actions -- visibility test, VM test, set VM
> bit, set all-visible -- were inside the critical section and after the
> WAL insert.
>
> 2a8e1ac5 left the block of actions in the critical section but moved
> them before the WAL insert. The commit message seems to indicate that
> there's a real problem setting the all-visible flag after the WAL
> insert, but I couldn't identify a serious problem there.

Yes, that was confusing. Imo it made the state worse, not better. I'd
asked for clarification somewhere... Heikki? Does your change still make
sense to you and do you see problem with the current state (as of ecac0e2b)?

> If I understand correctly, you essentially reverted 2a8e1ac5 (which
> implies that you think it didn't fix a real problem) and then fixed a
> real problem that was missed by 2a8e1ac5. Is that correct?

Well, I wouldn't say 2a8e1ac5 'missed' that problem. It tried to solve
something else (which it imo didn't). But yes, it didn't really improve
the state with regard to doing dangerous things in a critical section.

> > Backpatch to 9.3 where the problem was introduced. The bug only came
> > to light due to the assertion added in 4a170ee9 and isn't likely to
> > cause problems in production scenarios. The worst outcome is a
> > avoidable PANIC restart.
> >
> > This also gets rid of the difference in the order of operations
> > between master and standby mentioned in 2a8e1ac5.
>
> Can you explain? It looks like the master will still see the bit right
> away after the HEAP2_CLEAN record, but there is nothing in the
> HEAP2_CLEAN record to tell the standby to set all-visible. The standby
> has to wait until the HEAP2_VISIBLE record comes along.

Nobody can see the page in that state on the master - it'll be locked
exclusively. But that's not really the point.

What could happen with 2a8e1ac5's coding is that log_heap_clean()'s
XLogInsert() took a full page image which then contained the all visible
bit because the PageSetAllVisible() was done *before* it. The
standby would have replayed the HEAP2_CLEAN including the FPI and then
unlocked the page. At that point the heap page's all visible will be set,
but the vm bit wouldn't.
With the new coding (ecac0e2b9) the HEAP2_CLEAN cannot log the all
visible bit because it's not set yet. Only the HEAP2_VISIBLE will set
it on both the heap and the vm. Makes sense?

> I'm a little concerned that we're approaching the WAL rules in an ad-hoc
> manner. I don't see an actual problem right now, but we've been fixing
> problems with PD_ALL_VISIBLE since the VM was made crash safe almost
> exactly three years ago (503c7305). Do we finally feel confident in the
> design and its implications?

I'm not particularly happy about all this either, but this section of
code is actually much newer. It was added in fdf9e21196a6/9.3.

> With checksums, Simon and I tried to create some new abstractions, like
> MarkBufferDirtyHint, and document them in transam/README. I hope that
> gives us some greater confidence in the checksums code[1], because we
> can see a lot of the tricky aspects in one place, along with the rules
> that callers should follow.

Hm. I don't think the current callsites dealing with this are easily
amenable to something like that.

> A quick look at the comment above the function tells us that
> MarkBufferDirtyHint won't work for the all-visible bit, but perhaps an
> adapted version could work. If so, we could separate the all-visible bit
> from the HEAP2_CLEAN action, and it would look a lot more like setting a
> tuple hint.

Huh? HEAP2_CLEAN doesn't set all visible? The only reason we're now
doing a HEAP2_VISIBLE there is that removing the tuples in the _CLEAN
step increases the chance of it the page now being all visible? It's
separate steps.

> Also, we might be able to simplify visibilitymap_set if the
> logging of the heap page for checksums could be done beforehand with
> this variant of MarkBufferDirtyHint.

Hm. I have a hard time seing how that'd simplify things.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

From
Heikki Linnakangas
Date:
On 06/24/2014 01:27 AM, Andres Freund wrote:
> On 2014-06-23 13:00:11 -0700, Jeff Davis wrote:
>> On Fri, 2014-06-20 at 09:10 +0000, Andres Freund wrote:
>>> To fix, simply move all the all-visible handling outside the critical
>>> section. Doing so means that the PD_ALL_VISIBLE on the page won't be
>>> included in the full page image of the HEAP2_CLEAN record anymore. But
>>> that's fine, the flag will be set by the HEAP2_VISIBLE logged later.
>>
>> Trying to follow along. I read the discussion about bug #10533.
>>
>> Before 2a8e1ac5, a block of actions -- visibility test, VM test, set VM
>> bit, set all-visible -- were inside the critical section and after the
>> WAL insert.
>>
>> 2a8e1ac5 left the block of actions in the critical section but moved
>> them before the WAL insert. The commit message seems to indicate that
>> there's a real problem setting the all-visible flag after the WAL
>> insert, but I couldn't identify a serious problem there.
>
> Yes, that was confusing. Imo it made the state worse, not better. I'd
> asked for clarification somewhere... Heikki?

Hmm. I don't see any live bug caused before 2a8e1ac5 anymore either. I
think I missed the fact that replaying the XLOG_HEAP2_VISIBLE record
always sets the bit in the heap page.

> Does your change still make
> sense to you and do you see problem with the current state (as of ecac0e2b)?

Hmm, in the current state, it's again possible that the full-page image
doesn't contain the all-visible flag, even though the page in the buffer
does. I guess that's OK, because replaying XLOG_HEAP2_VISIBLE always
sets the flag. My page-comparison tool will complain, so it would be
nice to not do that, but it's a false alarm.

Or we could call heap_page_is_all_visible() before entering the critical
section, if we passed heap_page_is_all_visible() the list of dead tuples
we're removing so that it could ignore them.

>> I'm a little concerned that we're approaching the WAL rules in an ad-hoc
>> manner. I don't see an actual problem right now, but we've been fixing
>> problems with PD_ALL_VISIBLE since the VM was made crash safe almost
>> exactly three years ago (503c7305). Do we finally feel confident in the
>> design and its implications?
>
> I'm not particularly happy about all this either, but this section of
> code is actually much newer. It was added in fdf9e21196a6/9.3.

I'm not happy with the way we deviate from the normal WAL rules with in
the visibility map either. In the long run, we may be better off to just
take the hit and WAL-log the VM bits like any other update. Basically,
always behave as if checksums were enabled. But for now we just have to
fix bugs as they're found.

- Heikki


On Tue, 2014-06-24 at 00:27 +0200, Andres Freund wrote:
> > Can you explain? It looks like the master will still see the bit right
> > away after the HEAP2_CLEAN record, but there is nothing in the
> > HEAP2_CLEAN record to tell the standby to set all-visible. The standby
> > has to wait until the HEAP2_VISIBLE record comes along.
>
> Nobody can see the page in that state on the master - it'll be locked
> exclusively. But that's not really the point.
>
> What could happen with 2a8e1ac5's coding is that log_heap_clean()'s
> XLogInsert() took a full page image which then contained the all visible
> bit because the PageSetAllVisible() was done *before* it. The
> standby would have replayed the HEAP2_CLEAN including the FPI and then
> unlocked the page. At that point the heap page's all visible will be set,
> but the vm bit wouldn't.
> With the new coding (ecac0e2b9) the HEAP2_CLEAN cannot log the all
> visible bit because it's not set yet. Only the HEAP2_VISIBLE will set
> it on both the heap and the vm. Makes sense?

Yes, it makes much more sense now, thank you.

> > With checksums, Simon and I tried to create some new abstractions, like
> > MarkBufferDirtyHint, and document them in transam/README. I hope that
> > gives us some greater confidence in the checksums code[1], because we
> > can see a lot of the tricky aspects in one place, along with the rules
> > that callers should follow.
>
> Hm. I don't think the current callsites dealing with this are easily
> amenable to something like that.

I haven't thought through the details, but I think that setting
PD_ALL_VISIBLE is too tricky of an operation to have different callers
doing different things.

It seems like there should be a way for all callers (there are only a
few) to set PD_ALL_VISIBLE and the VM bit the same way.

> > A quick look at the comment above the function tells us that
> > MarkBufferDirtyHint won't work for the all-visible bit, but perhaps an
> > adapted version could work. If so, we could separate the all-visible bit
> > from the HEAP2_CLEAN action, and it would look a lot more like setting a
> > tuple hint.
>
> Huh? HEAP2_CLEAN doesn't set all visible? The only reason we're now
> doing a HEAP2_VISIBLE there is that removing the tuples in the _CLEAN
> step increases the chance of it the page now being all visible? It's
> separate steps.

When setting all-visible was part of the same critical section doing the
logging for HEAP2_CLEAN, it seemed to be a part of that action. But
you're right: now it's more separate.

Regards,
    Jeff Davis




Re: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

From
Andres Freund
Date:
On 2014-06-26 10:39:01 +0300, Heikki Linnakangas wrote:
> On 06/24/2014 01:27 AM, Andres Freund wrote:
> >Does your change still make
> >sense to you and do you see problem with the current state (as of ecac0e2b)?
>
> Hmm, in the current state, it's again possible that the full-page image
> doesn't contain the all-visible flag, even though the page in the buffer
> does. I guess that's OK, because replaying XLOG_HEAP2_VISIBLE always sets
> the flag. My page-comparison tool will complain, so it would be nice to not
> do that, but it's a false alarm.

I don't see where that difference should come from? On both the primary
and standby neither the page nor the vm will have all-visible bit set
after the HEAP2_CLEAN. Both will have it set after the
HEAP2_VISIBLE.
The only chance for a disparity that I see is when crashing after
PageSetAllVisible(), but before the XLogInsert(). That's fairly harmless
afaics and I don't think your tool would pick that up?

I think we could fix it by doing something like

if (!PageIsAllVisible(page) && heap_page_is_all_visible())
{
     visibilitymap_pin(onerel, blkno, vmbuffer)

     START_CRIT_SECTION();
     PageSetAllVisible(page);
     if (!visibilitymap_test(onerel, blkno, vmbuffer))
     {
        visibilitymap_set(onerel, blkno, buffer, InvalidXLogRecPtr, *vmbuffer,
                          visibility_cutoff_xid);
     }
     END_CRIT_SECTION();
}

If we care.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services