Thread: Why do we have a WAL record for CLOG page extension?
Why do we have a WAL record for CLOG page extension? It seems to be unnecessary, since other code already covers that failure situation. We ignore heap extensions, on the assumption that the insertion will automatically extend the relation when we recover. We also ignore previously truncated clog pages, so the absence of a clog page at recovery time is clearly not an issue. So why not avoid writing WAL when we move to the next clog page and fix this at recovery time if it is an issue? Well, slru.c *already* includes code that specifically ignores errors for set/get TransactionIds during recovery when pages are absent, plus this is supposed to still work even if this is not the first page of a clog segment. So, the WAL record seems like it is not required. Patch removes the xlog record written at new page time. This ensures that we don't need to wait for an I/O when holding the XidGenLock, which can be a problem with hundreds of people requesting that lock, especially since we may need to queue for one of the WAL locks first. Leave the RMID for clog, in case required later. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
Simon Riggs <simon@2ndquadrant.com> writes: > Why do we have a WAL record for CLOG page extension? Think of it as being a substitute for a full-page image, or perhaps the init-bit in heap-insert records is the closest analogy. Your patch is unworkable because it provides no means for recovering from corrupt-on-disk clog pages. regards, tom lane
On Tue, 2006-06-06 at 10:59 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Why do we have a WAL record for CLOG page extension? > > Think of it as being a substitute for a full-page image, or perhaps the > init-bit in heap-insert records is the closest analogy. Your patch is > unworkable because it provides no means for recovering from > corrupt-on-disk clog pages. The current system only helps recover from corrupt-on-disk pages that have been newly written since last checkpoint (or maybe two checkpoints ago). If the pages were initialised before that then we've simply lost that data altogether anyhow. Seems like we're writing WAL for a fairly rare situation, yet we still have some big issues. Should we hold two copies of the clog for robustness? That way we can just switch to the second copy if we have problems. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com