Home > mailing lists

Re: clog_redo causing very long recovery time - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: clog_redo causing very long recovery time
Date	May 6, 2011 03:23:01
Msg-id	1756.1304652163@sss.pgh.pa.us Whole thread Raw
In response to	clog_redo causing very long recovery time (Joseph Conway <mail@joeconway.com>)
Responses	Re: clog_redo causing very long recovery time (Alvaro Herrera <alvherre@commandprompt.com>) Re: clog_redo causing very long recovery time (Joe Conway <mail@joeconway.com>) Re: clog_redo causing very long recovery time (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

Joseph Conway <mail@joeconway.com> writes:
> I'm working with a client that uses Postgres on what amounts to an
> appliance.

> The database is therefore subject to occasional torture such as, in this
> particular case, running out of disk space while performing a million
> plus queries (of mixed varieties, many using plpgsql with exception
> handling -- more on that later), and eventually being power-cycled. Upon
> restart, clog_redo was called approx 885000 times (CLOG_ZEROPAGE) during
> recovery, which took almost 2 hours on their hardware. I should note
> that this is on Postgres 8.3.x.

> After studying the source, I can only see one possible way that this
> could have occurred:

> In varsup.c:GetNewTracsactionId(), ExtendCLOG() needs to succeed on a
> freshly zeroed clog page, and ExtendSUBTRANS() has to fail. Both of
> these calls can lead to a page being pushed out of shared buffers and to
> disk, so given a lack of disk space, sufficient clog buffers, but lack
> of subtrans buffers, this could happen. At that point the transaction id
> does not get advanced, so clog zeros the same page, extendSUBTRANS()
> fails again, rinse and repeat.

> I believe in the case above, subtrans buffers were exhausted due to the
> extensive use of plpgsql with exception handling.

Hmm, interesting.  I believe that it's not really a question of buffer
space or lack of it, but whether the OS will give us disk space when we
try to add a page to the current pg_subtrans file.  In any case, the
point is that a failure between ExtendCLOG and incrementing nextXid
is bad news.

> The attached fix-clogredo diff is my proposal for a fix for this.

That seems pretty grotty :-(

I think a more elegant fix might be to just swap the order of the 
ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId.  The
reason that would help is that pg_subtrans isn't WAL-logged, so if
we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we
won't have written any XLOG entry, and thus repeated failures will not
result in repeated XLOG entries.  I seem to recall having considered
exactly that point when the clog WAL support was first done, but the
scenario evidently wasn't considered when subtransactions were stuck
in :-(.

It would probably also help to put in a comment admonishing people
to not add stuff right there.  I see the SSI guys have fallen into
the same trap.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 06 May 2011, 03:13:04
Subject: Why is RegisterPredicateLockingXid called while holding XidGenLock?

From: Alvaro Herrera
Date: 06 May 2011, 03:29:43
Subject: Re: clog_redo causing very long recovery time

Re: clog_redo causing very long recovery time - Mailing list pgsql-hackers

Previous

Next