Thread: Hot Standby: Caches and Locks

Hot Standby: Caches and Locks

From
Simon Riggs
Date:
Next stage is handling locks and proc interactions. While this has been
on Wiki for a while, I have made a few more improvements, so please read
again now.

Summary of Proposed Changes
---------------------------

* New RMgr using rmid==8 => RM_RELATION_ID (which fills last gap)
* Write new WAL message, XLOG_RELATION_INVAL immediately prior commit
* LockAquire() write new WAL message, XLOG_RELATION_LOCK
* Startup process queues sinval message when it sees XLOG_RELATION_INVAL
* Startup process takes and holds AccessExclusiveLock when it processes
XLOG_RELATION_LOCK message
* At xact_commit_redo we fire sinval messages and then release locks for
that transaction

Explanations 
------------

All read-only transactions need to maintain various caches: relcache,
catcache and smgr cache. Theses caches will be maintained on each
backend normally, re-reading catalog tables when invalidation messages
are received.

Invalidation messages will be sent by the Startup process. The Startup
process will not maintain its own copy of the caches, so will never
receive messages, only send them. XLOG_RELATION_INVAL messages will be
sent immediately prior to commit (only) using new function
LogCacheInval(), and also during EndNonTransactionalInvalidation(). We
do nothing at subtransaction commit. WAL record will contain a simple
contiguous array of SharedInvalidationMessage(s) that need to be sent.
If nothing to do, no WAL record. 

We can't send sinval messages after commit in case we crash and fail to
write WAL for them. We can't augment the commit/abort messages because
we must cater for non-transactional invalidations also, plus commit
xlrecs are already complex enough. So we log invalidations prior to
commit, queue them and then trigger the send at commit (if it happens).
We need do nothing in the abort case because we are not maintaining our
own caches in the Startup process. In the nontransactional invalidation
case we would process WAL records immediately.

Startup process will need to initialise using SharedInvalBackendInit()
which is not normally executed by auxiliary processes. Startup would
call this from AuxiliaryProcessMain() just before we call StartupXLOG().
We will need an extra slot in state arrays to allow for Startup process.

Startup process needs to reset its sinval nextMsgNum, so everybody
thinks it has read messages. It will be unprepared to handle catchup
requests if they were received for some reason, since only the Startup
process is sending messages at this point.

Startup process will continue to use XLogReadBuffer(), minimising the
changes required in current ResourceManager's _redo functions - there
are still some, see later. It also means that read-only backends will
use ReadBuffer() calls normally, so again, no changes required
throughout the normal executor code.

Locks will be taken by the Startup process when it receives a new WAL
message. XLOG_RELATION_LOCK messages will be sent each time a backend
*successfully* acquires an AccessExclusiveLock (only). We send it
immediately after the lock acquisition, which means we will often be
sending lock requests with no TransactionId assigned, so the slotId is
essential in tying up the lock request with the commit that later
releases it, since the commit does not include the vxid.

In recovery, transactions will not be permitted to take any lock higher
than AccessShareLock on an object, nor assign a TransactionId. This
should also prevent us from writing WAL, but we protect against that
specifically as well, just in case. (Maybe we can relax that to Assert
sometime later). We can dirty data blocks but only to set hint bits.
(That's another reason to differentiate between those two cases anyway).
Note that in recovery, we will always be allowed to set hint bits - no
need to check for asynchronous commits. All other actions which cause
dirty data blocks should not be allowed, though this will be just an
Assert. Specifically, HOT pruning will not be allowed in recovery mode.

Since read-only backends will only be allowed to take AccessShareLocks
there will be no lock conflicts apart from with AccessExclusiveLocks.
(If we allowed higher levels of lock we would then need to maintain
Multitrans to examine lock details, which we would also rather avoid).
So Startup process will not take, hold or release relation locks for any
purpose, *apart* from when AccessExclusiveLocks (AELs) are required. So
we will send WAL messages *only* for AELs.

The Startup process will emulate locking behaviour for transactions that
require AELs. AELs will be held by first inserting a dummy
TransactionLock entry into the lock table with the TransactionId of the
transaction that requests the lock. Then the lock entry will be made.
Locks will be released when processing a transaction commit, abort or
shutdown checkpoint message, and the lock table entry for the
transaction will be removed.

Any AEL request that conflicts with an existing lock will cause some
action: if it conflicts with an existing AEL then we issue a WARNING;
this should never have happened, but if it has it indicates that the
last transaction died with a FATAL error without writing an abort
record. If the AEL request conflicts with a read-only backend then we
wait for a while (as discussed previously) then the read-only backend
will receive a cancel message to make it go away.

If Startup process crashes it is a PANIC anyway, so there is no
difficulties in cleanup for the lock manager with this approach.

The LOCK TABLE command by default applies an AccessExclusiveLock. This
will generate WAL messages when executed on the primary node. When
executed on the standby node the default will be to issue an
AccessShareLock. Any LOCK TABLE command that runs on the standby and
requests a specific lock type other than AccessShareLock will be
rejected.

Note that it will not be possible to deadlock, since the Startup process
will receive only "already held" lock requests, and the query backends
will not be allowed to request locks that could cause deadlocks. This is
important because the Startup process should never die because of a
deadlock, it should always be the "other guy", else we probably should
PANIC. Advisory locks seem a problem here. My initial thought is to just
prevent them working during Hot Standby. We may relax that restriction
in a later release.

Code for sinvaladt message handling needs little change. It is already
generalised to allow any process to put messages onto the queue without
keeping state on a per-backend basis for those messages.

Code for locks messages needs to be generalised to allow the Startup
process to request locks by proxy for the transactions it is emulating.
Majority of refactoring will occur here. Fiddly, but no problems
foreseen.

Have I missed anything? Would anybody like more details anywhere?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: Caches and Locks

From
Simon Riggs
Date:
On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote:

> We can't augment the commit/abort messages because
> we must cater for non-transactional invalidations also, plus commit
> xlrecs are already complex enough. So we log invalidations prior to
> commit, queue them and then trigger the send at commit (if it
> happens).

Augmenting the commit messages seems like the better approach. It allows
invalidation messages to be fired as they are read off the xlrec. Still
need the additional message type to handle nontransactional
invalidation. There are other messages possibly more complex than this
already.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: Caches and Locks

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
>> We can't augment the commit/abort messages because
>> we must cater for non-transactional invalidations also, plus commit
>> xlrecs are already complex enough. So we log invalidations prior to
>> commit, queue them and then trigger the send at commit (if it
>> happens).

> Augmenting the commit messages seems like the better approach. It allows
> invalidation messages to be fired as they are read off the xlrec. Still
> need the additional message type to handle nontransactional
> invalidation. There are other messages possibly more complex than this
> already.

I guess I hadn't been paying attention, but: adding syscache inval
traffic to WAL seems like a completely horrid idea, both from the
complexity and performance standpoints.  What about using the existing
syscache logic to re-derive inval information from watching the update
operations?
        regards, tom lane


Re: Hot Standby: Caches and Locks

From
Simon Riggs
Date:
On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> >> We can't augment the commit/abort messages because
> >> we must cater for non-transactional invalidations also, plus commit
> >> xlrecs are already complex enough. So we log invalidations prior to
> >> commit, queue them and then trigger the send at commit (if it
> >> happens).
> 
> > Augmenting the commit messages seems like the better approach. It allows
> > invalidation messages to be fired as they are read off the xlrec. Still
> > need the additional message type to handle nontransactional
> > invalidation. There are other messages possibly more complex than this
> > already.
> 
> I guess I hadn't been paying attention, but: adding syscache inval
> traffic to WAL seems like a completely horrid idea, both from the
> complexity and performance standpoints.  

Well, it's coming out fairly simple actually. Can you explain where you
think the performance loss is? My expectation is less than a 0.1% WAL
volume overhead for typical systems. My comment this morning was to say
I've managed to augment the commit record, so we're not even sending
many additional messages.

It also makes much of the Hot Standby patch fairly simple, even if it is
large. Write something to WAL, act on it on the other side. I've paid
very close attention to minimising the effects on both the number of
lock acquisitions and total WAL volume, but having said that I expect
there to be many tuning opportunities.

> What about using the existing
> syscache logic to re-derive inval information from watching the update
> operations?

That does sound possible, but it makes some big assumptions about
transactional machinery being in place. It ain't. Subtransactions make
everything about 5 times more difficult, so it seems pretty scary to me.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot Standby: Caches and Locks

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote:
>> What about using the existing
>> syscache logic to re-derive inval information from watching the update
>> operations?

> That does sound possible, but it makes some big assumptions about
> transactional machinery being in place. It ain't. Subtransactions make
> everything about 5 times more difficult, so it seems pretty scary to me.

Um.  Yeah, subtransactions would be a PITA.  Never mind that then ...
        regards, tom lane


Re: Hot Standby: Caches and Locks

From
Simon Riggs
Date:
On Thu, 2008-10-30 at 10:13 +0000, Simon Riggs wrote:
> On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote:
> 
> > We can't augment the commit/abort messages because
> > we must cater for non-transactional invalidations also, plus commit
> > xlrecs are already complex enough. So we log invalidations prior to
> > commit, queue them and then trigger the send at commit (if it
> > happens).
> 
> Augmenting the commit messages seems like the better approach. It allows
> invalidation messages to be fired as they are read off the xlrec. Still
> need the additional message type to handle nontransactional
> invalidation. There are other messages possibly more complex than this
> already.

Just a quick note to say that this approach has worked fine and I now
have both cache invalidation and locking working correctly.

Rather than submit something now in an unseemly rush I'll tidy up and
add it onto the list tomorrow after some tidy up.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support