Re: Problem with PITR recovery - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Problem with PITR recovery |
Date | |
Msg-id | 1114018730.16721.2299.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Problem with PITR recovery (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Problem with PITR recovery
Re: Problem with PITR recovery |
List | pgsql-hackers |
On Mon, 2005-04-18 at 23:20 +0100, Simon Riggs wrote: > My plan would be to write a special xlog record for xlog switching. This > would be a special processing instruction, rather than a data/redo > instructions. This would be implemented as another xlog info value on > the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo > would simply set a variable to be used elsewhere.) > > When written the xlog switch instruction (XLogInsert) would switch to a > new xlog, just as if a file had been filled, causing it to be > immediately archived. This has been mostly implemented and posted to PATCHES, though I have a later patch also. There are some points still to discuss. Setting the pointer seems to work, but there are 3 pointers, each protected by a separate locks. All of those are designed to be taken and held independently. My understanding is that the correct locking order would be: WALInsertLock WALWriteLock info_lck XLogInsert uses info_lck first, but then checks everything again once it acquires WALInsertLock. To switch files, we must ensure that nobody can insert xlrecs with a record pointer higher than the log switch record. This is different from checkpoints, where a checkpoint record can actually occur before records which are logically after it; that must never happen with a log switch else we'd miss them entirely on wal replay. Next, from XLogInsert with WALInsertLock held, we wait to acquire WALWriteLock, since an I/O might be in progress currently. When we have this, we then issue an XLogWrite, during which we update the record pointer, which then is propogated through to info_lck. AFAICS this is the only case of unconditionally acquiring all 3 locks. Do we agree that this is the correct lock sequence, and if it is, do we think that this leaves open the chance of deadlock at any stage? > A shutdown checkpoint would also have the same effect as an > XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy > away the file. Otherwise, we'd have a problem as to which order to write > the messages in at shutdown time. (Not happy about that bit, so > suggestions welcome...) Treating shutdown checkpoint markers as xlog switches is possible but gives problems since archive_command is a SUSET variable. On replay we wouldn't necessarily know whether a shutdown checkpoint was treated as an xlog switch when it was written, so we'd need to attempt to switch and look beyond the checkpoint marker, just in case. That makes me uncomfortable. Hmmm... Best Regards, Simon Riggs
pgsql-hackers by date: