Home > mailing lists

hot standby startup, visibility map, clog - Mailing list pgsql-hackers

From	Daniel Farina
Subject	hot standby startup, visibility map, clog
Date	June 9, 2011 10:12:39
Msg-id	BANLkTinCXfATbbPdXpz1OMW9A1Terg9hLQ@mail.gmail.com Whole thread Raw
Responses	Re: hot standby startup, visibility map, clog
List	pgsql-hackers

Tree view

Hello list,

A little while ago time ago I posted about how my ... exciting ....
backup procedure caused occasional problems starting due to clog not
being big enough.
(http://archives.postgresql.org/pgsql-hackers/2011-04/msg01148.php) I
recently had a reproduction and a little bit of luck, and I think I
have a slightly better idea of what may be causing this.

The first fact is that turning off hot standby will let the cluster
start up, but only after seeing a spate of messages like these (dozen
or dozens, not thousands):

2011-06-09 08:02:32 UTC  LOG:  restored log file
"000000020000002C000000C0" from archive
2011-06-09 08:02:33 UTC  WARNING:  xlog min recovery request
2C/C1F09658 is past current point 2C/C037B278
2011-06-09 08:02:33 UTC  CONTEXT:  writing block 0 of relation
base/16385/16784_vmxlog redo insert: rel 1663/16385/128029; tid 114321/63
2011-06-09 08:02:33 UTC  LOG:  restartpoint starting: xlog

Most importantly, *all* such messages are in visibility map forks
(_vm).  I reasonably confident that my code does not start reading
data until pg_start_backup() has returned, and blocks on
pg_stop_backup() after having read all the data.  Also, the mailing
list correspondence at
http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php
suggests that the visibility map is not flushed at checkpoints, so
perhaps with some poor timing an old page can wander onto disk even
after a checkpoint barrier that pg_start_backup waits for. (I have not
yet found the critical section that makes visibilitymap buffers immune
to checkpoint though).

Given all that, if the smgr's generic read path that checks the LSN
and possibly the clog (but apparently only in hot standby mode, since
pre-hot-standby the clog's intermediate states were not so
interesting...) has a problem with such uncheckpointed pages, then it
would seem reasonable that the system refuses to start vs. the way it
once did.

FWIW, letting recovery run without hot standby for a little while,
canceling, and then starting again after the danger zone had passed
would allow recovery to proceed correctly, as one might expect.

Thoughts?

-- 
fdr

pgsql-hackers by date:

From: Shigeru Hanada
Date: 09 June 2011, 10:12:22
Subject: FOREIGN TABLE doc fix

From: Kohei KaiGai
Date: 09 June 2011, 10:17:18
Subject: [v9.2] sepgsql - userspace access vector cache (Re: [v9.1] sepgsql - userspace access vector cache)

hot standby startup, visibility map, clog - Mailing list pgsql-hackers

Previous

Next