CLOG extension - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | CLOG extension |
Date | |
Msg-id | CA+Tgmoa6aZkd=HY=o41UrZqRXHFie5i3kw5HZRCEWhy0VXaXFg@mail.gmail.com Whole thread Raw |
Responses |
Re: CLOG extension
|
List | pgsql-hackers |
Currently, the following can happen: 1. A backend needs a new transaction, so it calls GetNewTransactionId(). It acquires XidGenLock and then calls ExtendCLOG(). 2. ExtendCLOG() decides that a new CLOG page is needed, so it acquires CLogControlLock and then calls ZeroCLOGPage(). 3. ZeroCLOGPage() calls WriteZeroPageXlogRec(), which calls XLogInsert(). 4. XLogInsert() acquires WALInsertLock and then calls AdvanceXLInsertBuffer(). 5. AdvanceXLInsertBuffer() sees that WAL buffers may be full and acquires WALWriteLock to check, and possibly to write WAL if the buffers are in fact full. At this point, we have a single backend simultaneously holding XidGenLock, CLogControlLock, WALInsertLock, and WALWriteLock, which from a concurrency standpoint is, at the risk of considerable understatement, not so great. The situation is no better if (as seems to be more typical) we block waiting for WALWriteLock rather than actually holding it ourselves: either way, nobody can get perform any WAL-logged operation, get an XID, or consult CLOG - so all write activity is blocked, and read activity will block as well as soon as it hits an unhinted tuple. This leads to a couple of questions. First, do we really need to WAL-log CLOG extension at all? Perhaps recovery should simply extend CLOG when it hits a commit or abort record that references a page that doesn't exist yet. Second, is there any harm in pre-extending CLOG? Currently, we don't extend CLOG until we get to the point where the XID we're allocating is on a page that doesn't exist yet, so no further XIDs can be assigned until the extension is complete. We could avoid that by extending a page in advance. Right now, whenever a backend rolls onto a new CLOG page, it must first create it. What we could do instead is try to stay one page ahead of whatever we're currently using: whenever a backend rolls onto a new CLOG page, it creates *the next page*. That way, it can release XidGenLock first and *then* call ExtendCLOG(). That allows all the other backends to continue allocating XIDs in parallel with the CLOG extension. In theory we could still get a pile-up if the entire page worth of XIDs gets used up before we can finish the extension, but that should be pretty rare. (Alternatively, we could introduce a separate background process to extend CLOG, and just have foreground processes kick it periodically. This currently seems like overkill to me.) Third, assuming we do need to write WAL, can we somehow rejigger the logging so that we need not hold CLogControlLock while we're writing it, so that other backends can still do CLOG lookups during that time?Maybe when we take CLogControlLock and observe thatextension is needed, we can release CLogControlLock, WAL-log the extension, and then retake CLogControlLock to do SimpleLruZeroPage(). We might need a separate CLogExtensionLock to make sure that two different backends aren't trying to do this dance at the same time, but that should be largely uncontended. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: