Hot Standby: Caches and Locks - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Hot Standby: Caches and Locks |
Date | |
Msg-id | 1224597980.27145.90.camel@ebony.2ndQuadrant Whole thread Raw |
Responses |
Re: Hot Standby: Caches and Locks
|
List | pgsql-hackers |
Next stage is handling locks and proc interactions. While this has been on Wiki for a while, I have made a few more improvements, so please read again now. Summary of Proposed Changes --------------------------- * New RMgr using rmid==8 => RM_RELATION_ID (which fills last gap) * Write new WAL message, XLOG_RELATION_INVAL immediately prior commit * LockAquire() write new WAL message, XLOG_RELATION_LOCK * Startup process queues sinval message when it sees XLOG_RELATION_INVAL * Startup process takes and holds AccessExclusiveLock when it processes XLOG_RELATION_LOCK message * At xact_commit_redo we fire sinval messages and then release locks for that transaction Explanations ------------ All read-only transactions need to maintain various caches: relcache, catcache and smgr cache. Theses caches will be maintained on each backend normally, re-reading catalog tables when invalidation messages are received. Invalidation messages will be sent by the Startup process. The Startup process will not maintain its own copy of the caches, so will never receive messages, only send them. XLOG_RELATION_INVAL messages will be sent immediately prior to commit (only) using new function LogCacheInval(), and also during EndNonTransactionalInvalidation(). We do nothing at subtransaction commit. WAL record will contain a simple contiguous array of SharedInvalidationMessage(s) that need to be sent. If nothing to do, no WAL record. We can't send sinval messages after commit in case we crash and fail to write WAL for them. We can't augment the commit/abort messages because we must cater for non-transactional invalidations also, plus commit xlrecs are already complex enough. So we log invalidations prior to commit, queue them and then trigger the send at commit (if it happens). We need do nothing in the abort case because we are not maintaining our own caches in the Startup process. In the nontransactional invalidation case we would process WAL records immediately. Startup process will need to initialise using SharedInvalBackendInit() which is not normally executed by auxiliary processes. Startup would call this from AuxiliaryProcessMain() just before we call StartupXLOG(). We will need an extra slot in state arrays to allow for Startup process. Startup process needs to reset its sinval nextMsgNum, so everybody thinks it has read messages. It will be unprepared to handle catchup requests if they were received for some reason, since only the Startup process is sending messages at this point. Startup process will continue to use XLogReadBuffer(), minimising the changes required in current ResourceManager's _redo functions - there are still some, see later. It also means that read-only backends will use ReadBuffer() calls normally, so again, no changes required throughout the normal executor code. Locks will be taken by the Startup process when it receives a new WAL message. XLOG_RELATION_LOCK messages will be sent each time a backend *successfully* acquires an AccessExclusiveLock (only). We send it immediately after the lock acquisition, which means we will often be sending lock requests with no TransactionId assigned, so the slotId is essential in tying up the lock request with the commit that later releases it, since the commit does not include the vxid. In recovery, transactions will not be permitted to take any lock higher than AccessShareLock on an object, nor assign a TransactionId. This should also prevent us from writing WAL, but we protect against that specifically as well, just in case. (Maybe we can relax that to Assert sometime later). We can dirty data blocks but only to set hint bits. (That's another reason to differentiate between those two cases anyway). Note that in recovery, we will always be allowed to set hint bits - no need to check for asynchronous commits. All other actions which cause dirty data blocks should not be allowed, though this will be just an Assert. Specifically, HOT pruning will not be allowed in recovery mode. Since read-only backends will only be allowed to take AccessShareLocks there will be no lock conflicts apart from with AccessExclusiveLocks. (If we allowed higher levels of lock we would then need to maintain Multitrans to examine lock details, which we would also rather avoid). So Startup process will not take, hold or release relation locks for any purpose, *apart* from when AccessExclusiveLocks (AELs) are required. So we will send WAL messages *only* for AELs. The Startup process will emulate locking behaviour for transactions that require AELs. AELs will be held by first inserting a dummy TransactionLock entry into the lock table with the TransactionId of the transaction that requests the lock. Then the lock entry will be made. Locks will be released when processing a transaction commit, abort or shutdown checkpoint message, and the lock table entry for the transaction will be removed. Any AEL request that conflicts with an existing lock will cause some action: if it conflicts with an existing AEL then we issue a WARNING; this should never have happened, but if it has it indicates that the last transaction died with a FATAL error without writing an abort record. If the AEL request conflicts with a read-only backend then we wait for a while (as discussed previously) then the read-only backend will receive a cancel message to make it go away. If Startup process crashes it is a PANIC anyway, so there is no difficulties in cleanup for the lock manager with this approach. The LOCK TABLE command by default applies an AccessExclusiveLock. This will generate WAL messages when executed on the primary node. When executed on the standby node the default will be to issue an AccessShareLock. Any LOCK TABLE command that runs on the standby and requests a specific lock type other than AccessShareLock will be rejected. Note that it will not be possible to deadlock, since the Startup process will receive only "already held" lock requests, and the query backends will not be allowed to request locks that could cause deadlocks. This is important because the Startup process should never die because of a deadlock, it should always be the "other guy", else we probably should PANIC. Advisory locks seem a problem here. My initial thought is to just prevent them working during Hot Standby. We may relax that restriction in a later release. Code for sinvaladt message handling needs little change. It is already generalised to allow any process to put messages onto the queue without keeping state on a per-backend basis for those messages. Code for locks messages needs to be generalised to allow the Startup process to request locks by proxy for the transactions it is emulating. Majority of refactoring will occur here. Fiddly, but no problems foreseen. Have I missed anything? Would anybody like more details anywhere? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: