Thread: xlog min recovery request ... is past current point ...

xlog min recovery request ... is past current point ...

From
Christophe Pettus
Date:
PostgreSQL 9.0.4:

While bringing up a streaming replica, and while it is working its way through the WAL segments before connecting to
theprimary, I see a lot of messages of the form: 

2012-02-01 21:26:13.978 PST,,,24448,,4f2a1e61.5f80,54,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000065""from archive",,,,,,,,,"" 
2012-02-01 21:26:14.032 PST,,,24448,,4f2a1e61.5f80,55,,2012-02-01 21:25:53 PST,1/0,0,WARNING,01000,"xlog min recovery
requestDB5/42E15098 is past current point DB4/657FA490",,,,,"writing block 5 of relation base/155650/156470_vm 
xlog redo insert: rel 1663/155650/1658867; tid 9640/53",,,,""
2012-02-01 21:26:14.526 PST,,,24448,,4f2a1e61.5f80,56,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000066""from archive",,,,,,,,,"" 

All of these are on _vm relations.  The recovery completed successfully and the secondary connected to the primary
withoutissue, so: Are these messages something to be concerned over? 

--
-- Christophe Pettus xof@thebuild.com



Re: xlog min recovery request ... is past current point ...

From
Heikki Linnakangas
Date:
On 03.02.2012 18:32, Christophe Pettus wrote:
> PostgreSQL 9.0.4:
>
> While bringing up a streaming replica, and while it is working its way through the WAL segments before connecting to
theprimary, I see a lot of messages of the form:
 
>
> 2012-02-01 21:26:13.978 PST,,,24448,,4f2a1e61.5f80,54,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000065""from archive",,,,,,,,,""
 
> 2012-02-01 21:26:14.032 PST,,,24448,,4f2a1e61.5f80,55,,2012-02-01 21:25:53 PST,1/0,0,WARNING,01000,"xlog min recovery
requestDB5/42E15098 is past current point DB4/657FA490",,,,,"writing block 5 of relation base/155650/156470_vm
 
> xlog redo insert: rel 1663/155650/1658867; tid 9640/53",,,,""
> 2012-02-01 21:26:14.526 PST,,,24448,,4f2a1e61.5f80,56,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000066""from archive",,,,,,,,,""
 
>
> All of these are on _vm relations.  The recovery completed successfully and the secondary connected to the primary
withoutissue, so: Are these messages something to be concerned over?
 

Hmm, I think I see how that can happen:

0. A heap page has its bit set in visibility map to begin with

1. A heap tuple is inserted/updated/deleted. This clears the VM bit.
2. time passes, and more WAL is generated
3. The page is vacuumed, and the visibility map bit is set again.

In the standby, this can happen while replaying the WAL, if you restart 
the standby so that some WAL is re-replayed:

1. The update of the heap tuple is replayed. This clears the VM bit.
2. The VACUUM is replayed, setting the VM bit again, and updating the VM 
page's LSN.
3. Shutdown and restart standby
4. The heap update is replayed again. This again clears the VM bit, but 
does not set the LSN

If the VM page is now evicted from the buffer cache, you get the WARNING 
you saw, because the page is dirty, yet its LSN is beyond the current 
point in recovery.

AFAICS that's totally harmless, but the warning is quite alarming, so 
we'll have to figure out a way to fix that. Not sure how; perhaps we 
need to set the LSN on the VM page when the VM bit is cleared, but I 
don't remember off the top of my head if there was some important reason 
why we don't do that currently.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: xlog min recovery request ... is past current point ...

From
Robert Haas
Date:
On Sun, Mar 4, 2012 at 2:41 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Hmm, I think I see how that can happen:
>
> 0. A heap page has its bit set in visibility map to begin with
>
> 1. A heap tuple is inserted/updated/deleted. This clears the VM bit.
> 2. time passes, and more WAL is generated
> 3. The page is vacuumed, and the visibility map bit is set again.

Note that on 9.0.X, which Christophe is using, the setting of the
visibility map bit in step #3 is not WAL-logged.

> In the standby, this can happen while replaying the WAL, if you restart the
> standby so that some WAL is re-replayed:
>
> 1. The update of the heap tuple is replayed. This clears the VM bit.
> 2. The VACUUM is replayed, setting the VM bit again, and updating the VM
> page's LSN.

Therefore I think this won't happen either, on that version.  Do we
somehow emit an FPI for the VM page?

> 3. Shutdown and restart standby
> 4. The heap update is replayed again. This again clears the VM bit, but does
> not set the LSN
>
> If the VM page is now evicted from the buffer cache, you get the WARNING you
> saw, because the page is dirty, yet its LSN is beyond the current point in
> recovery.
>
> AFAICS that's totally harmless, but the warning is quite alarming, so we'll
> have to figure out a way to fix that. Not sure how; perhaps we need to set
> the LSN on the VM page when the VM bit is cleared, but I don't remember off
> the top of my head if there was some important reason why we don't do that
> currently.

I suspect that it was never done just because there was no clear
benefit, since heap_{insert,update,delete} all clear the bit
regardless of the page LSN.  But this might be a reason to do it.  I
can't swear it's safe, though, although I also can't see why it
wouldn't be.  Note also that 9.2devel behaves quite differently than
previous releases: every visibilitymap_set is WAL-logged and bumps the
vm page's LSN; whereas in prior releases no WAL record is emitted and
the vm page's LSN is advanced to the heap page's LSN if it lags it.
So we'd better think pretty carefully before assuming that any logic
about what is safe here is true for all branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] xlog min recovery request ... is past current point ...

From
Robert Haas
Date:
[ blast-from-the-past department ]

On Fri, Feb 3, 2012 at 11:33 AM Christophe Pettus <xof@thebuild.com> wrote:
> While bringing up a streaming replica, and while it is working its way through the WAL segments before connecting to
theprimary, I see a lot of messages of the form:
 
>
> 2012-02-01 21:26:13.978 PST,,,24448,,4f2a1e61.5f80,54,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000065""from archive",,,,,,,,,""
 
> 2012-02-01 21:26:14.032 PST,,,24448,,4f2a1e61.5f80,55,,2012-02-01 21:25:53 PST,1/0,0,WARNING,01000,"xlog min recovery
requestDB5/42E15098 is past current point DB4/657FA490",,,,,"writing block 5 of relation base/155650/156470_vm
 
> xlog redo insert: rel 1663/155650/1658867; tid 9640/53",,,,""
> 2012-02-01 21:26:14.526 PST,,,24448,,4f2a1e61.5f80,56,,2012-02-01 21:25:53 PST,1/0,0,LOG,00000,"restored log file
""0000000100000DB400000066""from archive",,,,,,,,,""
 
>
> All of these are on _vm relations.  The recovery completed successfully and the secondary connected to the primary
withoutissue, so: Are these messages something to be concerned over?
 

It appears that we never did anything about this, and I just came
across some evidence that it's still a live issue. And, looking at the
code, I can't see any reason why it shouldn't be. I found Heikki's
list of steps (in his 2012 reply) a bit difficult to understand, so
here's my attempt to restate the scenario he describes:

Suppose that a user begins a base backup. While the base backup is
running, the user performs a DML-operation on an all-visible page,
clearing the visibility map bit. Then, they run VACUUM, resetting the
bit. The page is copied after both of these things happen. Then, the
base backup ends. Next, the user tries to start up the standby. When
the DML operation is replayed, the visibility map bit is cleared
without changing the page LSN (e.g. heap_xlog_delete ->
visibilitymap_clear). If the buffer is now flushed out of
shared_buffers before the VACUUM is replayed, then we can get the
complaint mentioned in $SUBJECT.

OK, so what could/should we do about this?

VM pages are subject to a somewhat unusual scheme with regard to
write-ahead logging: to minimize WAL traffic, records that clear the
visibilitymap bit in passing do not register the buffer, meaning that
such records generate no full-page images for the visibility map.
Records that set the visibility map bit do register the buffer, but
suppress full-page images unless checksums or wal_log_hints are in
use. At least if checksums are not in use, this is reasonable, because
modifications to VM pages don't rely on the existing page contents, so
even if the page is torn, replay need not get confused, as long as it
ignores the page LSN when clearing the bit. And that's just what
heap_xlog_delete() and friends actually do.  It's acceptable but
useless to pay attention to the page LSN when setting a bit, because
the only thing we can do is decide to not set a bit that should have
been set, which doesn't break anything, and in fact
heap_xlog_visible() does pay attention to the page LSN "out of an
abundance of conservatism" -- a comment I wrote originally, and with
which I'm no longer very impressed, because it seems like it's just
making things more confusing.

The overall picture here is that the LSN of the VM page matters very
little. If heap_xlog_visible() didn't have that "if (lsn >
PageGetLSN(vmpage))" test guarding visibilitymap_set here, the rule
would be simple: the LSN of a VM page would have no effect on replay
of related WAL records, period. However, it would probably make the
warning that is the subject of this thread more common. Imagine the
same scenario outlined above but, instead of the bit starting out set
and then being cleared and set again, it starts out cleared and gets
set and cleared again. The rest is the same: the page is copied by the
base backup in its final form. As things stand, that if-test prevents
the VM bit from being set, which also means the page LSN is not set
nor is the page dirtied, which means that evicting the page can't
trigger this warning.

But, regardless of whether we actually remove that if-test,
considering it as a hypothetical seems useful. If we actually had a
rule that the page LSN doesn't really affect whether actions affecting
VM pages are replayed, then we would just need visibilitymap_clear()
to set the page LSN just like visibilitymap_set() does. The only thing
the page LSN would be doing is causing the warning which is the
subject of the thread to get generated, or not, so it would make sense
to just set in a way that avoids the warning, since it matters to
nothing else.

However, checksums seem to complicate the situation considerably.
Ignore everything proposed above and suppose that, with checksums
enabled, the user deletes a tuple from a page marked all-visible,
clearing the VM bit. The dirty page is flushed to the OS. As it is
being written back to durable storage, the system crashes, tearing the
page. There's no full-page image in the WAL to fix things because the
buffer isn't even registered, so I guess the next access to that VM
page just errors out. Am I missing something? If not, perhaps the
thing to do here is go the other direction and make the WAL logging
for VM pages more standard: always register the buffer, take full page
images just like any other buffer, and make the decision as to whether
or not to replay VM actions fully conditional on the page LSN. The
visibility map is so small that the additional volume of full-page
writes seems unlikely to matter very much. There's an additional
downside in that registering an additional buffer makes the WAL
records bigger, and making the WAL records for insert, update, and
delete bigger is pretty undesirable, but I think we could do it just
in the case where the bit actually needs to be cleared, which ought to
be comparatively rare, so maybe it's fine.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company