CLOG extension - Mailing list pgsql-hackers

From Robert Haas
Subject CLOG extension
Date
Msg-id CA+Tgmoa6aZkd=HY=o41UrZqRXHFie5i3kw5HZRCEWhy0VXaXFg@mail.gmail.com
Whole thread Raw
Responses Re: CLOG extension
List pgsql-hackers
Currently, the following can happen:

1. A backend needs a new transaction, so it calls
GetNewTransactionId().  It acquires XidGenLock and then calls
ExtendCLOG().
2. ExtendCLOG() decides that a new CLOG page is needed, so it acquires
CLogControlLock and then calls ZeroCLOGPage().
3. ZeroCLOGPage() calls WriteZeroPageXlogRec(), which calls XLogInsert().
4. XLogInsert() acquires WALInsertLock and then calls AdvanceXLInsertBuffer().
5. AdvanceXLInsertBuffer() sees that WAL buffers may be full and
acquires WALWriteLock to check, and possibly to write WAL if the
buffers are in fact full.

At this point, we have a single backend simultaneously holding
XidGenLock, CLogControlLock, WALInsertLock, and WALWriteLock, which
from a concurrency standpoint is, at the risk of considerable
understatement, not so great.  The situation is no better if (as seems
to be more typical) we block waiting for WALWriteLock rather than
actually holding it ourselves: either way, nobody can get perform any
WAL-logged operation, get an XID, or consult CLOG - so all write
activity is blocked, and read activity will block as well as soon as
it hits an unhinted tuple.  This leads to a couple of questions.

First, do we really need to WAL-log CLOG extension at all?  Perhaps
recovery should simply extend CLOG when it hits a commit or abort
record that references a page that doesn't exist yet.

Second, is there any harm in pre-extending CLOG?  Currently, we don't
extend CLOG until we get to the point where the XID we're allocating
is on a page that doesn't exist yet, so no further XIDs can be
assigned until the extension is complete.  We could avoid that by
extending a page in advance.  Right now, whenever a backend rolls onto
a new CLOG page, it must first create it.  What we could do instead is
try to stay one page ahead of whatever we're currently using: whenever
a backend rolls onto a new CLOG page, it creates *the next page*.
That way, it can release XidGenLock first and *then* call
ExtendCLOG().  That allows all the other backends to continue
allocating XIDs in parallel with the CLOG extension.  In theory we
could still get a pile-up if the entire page worth of XIDs gets used
up before we can finish the extension, but that should be pretty rare.

(Alternatively, we could introduce a separate background process to
extend CLOG, and just have foreground processes kick it periodically.
This currently seems like overkill to me.)

Third, assuming we do need to write WAL, can we somehow rejigger the
logging so that we need not hold CLogControlLock while we're writing
it, so that other backends can still do CLOG lookups during that time?Maybe when we take CLogControlLock and observe
thatextension is
 
needed, we can release CLogControlLock, WAL-log the extension, and
then retake CLogControlLock to do SimpleLruZeroPage().  We might need
a separate CLogExtensionLock to make sure that two different backends
aren't trying to do this dance at the same time, but that should be
largely uncontended.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Advisory locks seem rather broken
Next
From: Peter Geoghegan
Date:
Subject: Re: remove dead ports?