On Thu, May 19, 2011 at 01:52:46PM +0100, Leonardo Francalanci wrote:
> > I'd guess some WAL record arising from the post-crash master restart makes
> the
> > standby do so. When a crash isn't involved, the commit or abort record is
> >that
> > signal. You could test and find out how it happens after a master crash with
> >a
> > procedure like this:
> >
> > 1. Start a master and standby on the same machine.
> > 2. Connect to master; CREATE TABLE t(); BEGIN; ALTER TABLE t ADD c int;
> > 3. kill -9 -`head -n1 $master_PGDATA/postmaster.pid`
> > 4. Connect to standby and confirm that t is still locked.
> > 5. Attach debugger to standby startup process and set breakpoints on
> > StandbyReleaseLocks and StandbyReleaseLocksMany.
> > 6. Restart master.
>
>
> Well yes, based on the test the stack is something like:
>
> StandbyReleaseLocksMany
> StandbyReleaseOldLocks
> ProcArrayApplyRecoveryInfo
> xlog_redo
>
> It's not very clear to me what ProcArrayApplyRecoveryInfo does (not too
> familiar with the standby part I guess) but I see it's called by xlog_redo in
> the "info == XLOG_CHECKPOINT_SHUTDOWN" case and by StartupXLOG.
>
> But I don't know if calling ResetUnloggedRelations before
> the call to ProcArrayApplyRecoveryInfo in xlog_redo makes sense...
> if it makes sense, it would solve the problem of the stray files in
> the master crashing case I guess?
It would solve the problem, but it would mean resetting unlogged relations on
the standby at every shutdown checkpoint. That's probably not a performance
problem, but it is a hack. Offhand, I'd add a new smgr WAL record issued by
ResetUnloggedRelations() when called with UNLOGGED_RELATION_CLEANUP. Another,
simpler, idea is to split XLOG_CHECKPOINT_SHUTDOWN into XLOG_CHECKPOINT_SHUTDOWN
and XLOG_CHECKPOINT_END_OF_RECOVERY, mirroring CreateCheckPoint()'s distinction.
(Given that I regularly lack good taste, you might want to wait for other people
to weigh in before spending too much time on that.)
> > > > When you promote the standby, though,
> > > ShutdownRecoveryTransactionEnvironment()
> > > > releases the locks.
>
>
> If I understand the code, ResetUnloggedRelations is called before
> ShutdownRecoveryTransactionEnvironment, so that part shouldn't be
> an issue
Seems correct.
nm