Thread: Hot standby, conflict resolution

Hot standby, conflict resolution

From
Heikki Linnakangas
Date:
The FATAL and ERROR cancellation modes are quite different. In FATAL 
mode, you just want to kill the backend. The target connection doesn't 
need to know the LSN.

In ERROR mode, you don't really want to interrupt the target backend. In 
ReadBuffer, you're checking a global variable, 
BufferRecoveryConflictPending on each call, and if it's set, you check 
the buffer's LSN against the LSN of the earliest LSN conflicting LSN, 
and throw an error if it's greater than that. Why do we jump through so 
many hoops to get the earliest conflicting LSN to where it's needed? At 
the moment:

1. Startup process sets the LSN in the target backend's PGPROC entry, 
and signals it with SIGINT.
2. The target backend receives the signal; ProcessInterrupts is called 
either immediately or at the next CHECK_FOR_INTERRUPTS() call.
3. ProcessInterrupts reads the value from PGPROC, and passes it to bufmgr.c

ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
there would be no need for the signaling (in the ERROR mode).

Correct me if I'm wrong, but I thought the idea of this new conflict 
resolution was that the startup process doesn't need to wait for the 
target backend to die. Instead, the target backend knows to commit 
suicide if it stumbles into a buffer that's been modified in a 
conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
looks like we still wait.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, conflict resolution

From
Simon Riggs
Date:
On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:

> In ERROR mode, you don't really want to interrupt the target backend. In 
> ReadBuffer, you're checking a global variable, 
> BufferRecoveryConflictPending on each call, and if it's set, you check 
> the buffer's LSN against the LSN of the earliest LSN conflicting LSN, 
> and throw an error if it's greater than that. Why do we jump through so 
> many hoops to get the earliest conflicting LSN to where it's needed? At 
> the moment:
> 
> 1. Startup process sets the LSN in the target backend's PGPROC entry, 
> and signals it with SIGINT.
> 2. The target backend receives the signal; ProcessInterrupts is called 
> either immediately or at the next CHECK_FOR_INTERRUPTS() call.
> 3. ProcessInterrupts reads the value from PGPROC, and passes it to bufmgr.c
> 
> ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
> there would be no need for the signaling (in the ERROR mode).

That is possible and I considered it. If we did it that way we would
need to read the PGPROC each time we read a buffer. AFAICS we would need
to use a spinlock to do that since reading an XLogRecPtr would not be
atomic.

So doing it the way I've done it allows us to use a local variable which
can be more easily cached and avoids the locking overhead.

We do still need to signal anyway for the FATAL case, so we're not
significantly affecting the patch footprint by changing that.

> Correct me if I'm wrong, but I thought the idea of this new conflict 
> resolution was that the startup process doesn't need to wait for the 
> target backend to die. Instead, the target backend knows to commit 
> suicide if it stumbles into a buffer that's been modified in a 
> conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
> looks like we still wait.

err, no, that's just an oversight, not intentional.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:
>> ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
>> there would be no need for the signaling (in the ERROR mode).
> 
> That is possible and I considered it. If we did it that way we would
> need to read the PGPROC each time we read a buffer. AFAICS we would need
> to use a spinlock to do that since reading an XLogRecPtr would not be
> atomic.

Hmm, I think you could make it lock-free by ordering the instructions 
carefully. Not sure it's worth the code complexity, though.

>> Correct me if I'm wrong, but I thought the idea of this new conflict 
>> resolution was that the startup process doesn't need to wait for the 
>> target backend to die. Instead, the target backend knows to commit 
>> suicide if it stumbles into a buffer that's been modified in a 
>> conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
>> looks like we still wait.
> 
> err, no, that's just an oversight, not intentional.

Ok, then I think we have a little race condition. The startup process 
doesn't get any reply indicating that the target backend has processed 
the SIGINT and set the cached conflict LSN. The target backend might 
move ahead using the old LSN for a little while, even though the startup 
process has already gone ahead and replayed a vacuum record.

Another tiny issue is that it looks like a new conflict LSN always 
overwrites the old one. But you should always use the oldest conflicted 
LSN in the checks, not the newest.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, conflict resolution

From
Simon Riggs
Date:
On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

> >> Correct me if I'm wrong, but I thought the idea of this new conflict 
> >> resolution was that the startup process doesn't need to wait for the 
> >> target backend to die. Instead, the target backend knows to commit 
> >> suicide if it stumbles into a buffer that's been modified in a 
> >> conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
> >> looks like we still wait.
> > 
> > err, no, that's just an oversight, not intentional.
> 
> Ok, then I think we have a little race condition. The startup process 
> doesn't get any reply indicating that the target backend has processed 
> the SIGINT and set the cached conflict LSN. The target backend might 
> move ahead using the old LSN for a little while, even though the startup 
> process has already gone ahead and replayed a vacuum record.

Hah! You had me either way. Very neat :-)

I'll think some more and reply, but not tonight.

> Another tiny issue is that it looks like a new conflict LSN always 
> overwrites the old one. But you should always use the oldest conflicted 
> LSN in the checks, not the newest.

OK

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, conflict resolution

From
Simon Riggs
Date:
On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

> Ok, then I think we have a little race condition. The startup process 
> doesn't get any reply indicating that the target backend has
> processed 
> the SIGINT and set the cached conflict LSN. The target backend might 
> move ahead using the old LSN for a little while, even though the
> startup 
> process has already gone ahead and replayed a vacuum record.
> 
> Another tiny issue is that it looks like a new conflict LSN always 
> overwrites the old one. But you should always use the oldest
> conflicted 
> LSN in the checks, not the newest.

That makes it easier, because it is either not set, or it is set and
does not need to be reset as new conflict LSNs appear.

I can see a simple scheme emerging, which I will detail tomorrow
morning.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, conflict resolution

From
Simon Riggs
Date:
On Sun, 2009-01-25 at 16:19 +0000, Simon Riggs wrote:
> On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:
> 
> > Ok, then I think we have a little race condition. The startup process 
> > doesn't get any reply indicating that the target backend has
> > processed 
> > the SIGINT and set the cached conflict LSN. The target backend might 
> > move ahead using the old LSN for a little while, even though the
> > startup 
> > process has already gone ahead and replayed a vacuum record.
> > 
> > Another tiny issue is that it looks like a new conflict LSN always 
> > overwrites the old one. But you should always use the oldest
> > conflicted 
> > LSN in the checks, not the newest.
> 
> That makes it easier, because it is either not set, or it is set and
> does not need to be reset as new conflict LSNs appear.
> 
> I can see a simple scheme emerging, which I will detail tomorrow
> morning.

Rather than signalling, we could use a hasconflict boolean for each proc
in a shared data structure. It can be read without spinlock, but should
only be written while holding spinlock.

Each time we read a block we check if hasconflict is set. If it is, we
grab spinlock, recheck if it is set, if so read the conflict details,
clear the flag and drop the spinlock.


The aim of this type of conflict resolution was to reduce the footprint
of users that would be effected and defer it as much as possible. We've
spent time getting the latestCompletedXid, but we know deriving that
value is very difficult in the btree case at least. So what I would like
to do is pass the relid of a conflict across as well and use that to
reduce the footprint, now that we are performing the test inside the
buffer manager.

We would keep a relid cache with a very small number of relids, perhaps
just one, maybe as many as 4 or 8, so that we can fit relids and
associated LSNs in a single cache line. We can match the relid using a
simple for loop, which we know is well optimised when there is no
dependency between the elements of the loop and the loop has a
compile-time fixed number of iterations.

I would be inclined to make this a separate shared memory area rather
than try to weld that onto PGPROC. We could index that using backendid.

If the relid cache overflows, we just apply a general LSN value.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, conflict resolution

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Rather than signalling, we could use a hasconflict boolean for each proc
> in a shared data structure. It can be read without spinlock, but should
> only be written while holding spinlock.
> 
> Each time we read a block we check if hasconflict is set. If it is, we
> grab spinlock, recheck if it is set, if so read the conflict details,
> clear the flag and drop the spinlock.

Yeah, that seems workable.

> The aim of this type of conflict resolution was to reduce the footprint
> of users that would be effected and defer it as much as possible. We've
> spent time getting the latestCompletedXid, but we know deriving that
> value is very difficult in the btree case at least. So what I would like
> to do is pass the relid of a conflict across as well and use that to
> reduce the footprint, now that we are performing the test inside the
> buffer manager.

I agree that would be useful, but I'd prefer to keep it simple for now...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com