Thread: An example of bugs for Hot Standby

An example of bugs for Hot Standby

From
Hiroyuki Yamada
Date:
Hot Standby node can freeze when startup process calls LockBufferForCleanup().
This bug can be reproduced by the following procedure.

0. start Hot Standby, with one active node(node A) and one standby node(node B)
1. create table X and table Y in node A
2. insert several rows in table X in node A
3. delete one row from table X in node A
4. begin xact 1 in node A, execute following commands, and leave xact 1 open
4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE
5. wait until WAL's for above actions are applied in node B
6. begin xact 2 in node B, and execute following commands
6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X;
6.2 FETCH test_cursor;
6.3 SELECT * FROM table Y;
7. execute VACUUM FREEZE table A in node A
8. commit xact 1 in node A

...then in node B occurs following "deadlock" situation, which is not detected by deadlock check.* startup process
waitsfor xact 2 to release buffers in table X (in LockBufferForCleanup())* xact 2 waits for startup process to release
ACCESSEXCLUSIVE lock in table Y
 

This situation can occur whena) a transaction in the standby node tries to acquire ACCESS SHARE lock while holding some
buffersb)startup process calls LockBufferForCleanup() for any of the buffers
 


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: An example of bugs for Hot Standby

From
"Kevin Grittner"
Date:
Hiroyuki Yamada <yamada@kokolink.net> wrote:
>  4. Startup process tries to redo XLOG_HEAP2_CLEAN record, calls
>     LockBufferForCleanup() and freezes until the Xact 1 ends.
I think they word you're searching for is "blocks".  Blocking to
protect integrity doesn't sound like a bug to me; perhaps an
opportunity for enhancement.

-Kevin


Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote:
> Hot Standby node can freeze when startup process calls LockBufferForCleanup().
> This bug can be reproduced by the following procedure.
>
> 0. start Hot Standby, with one active node(node A) and one standby node(node B)
> 1. create table X and table Y in node A
> 2. insert several rows in table X in node A
> 3. delete one row from table X in node A
> 4. begin xact 1 in node A, execute following commands, and leave xact 1 open
> 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE
> 5. wait until WAL's for above actions are applied in node B
> 6. begin xact 2 in node B, and execute following commands
> 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X;
> 6.2 FETCH test_cursor;
> 6.3 SELECT * FROM table Y;
> 7. execute VACUUM FREEZE table A in node A
> 8. commit xact 1 in node A
>
> ...then in node B occurs following "deadlock" situation, which is not detected by deadlock check.
>  * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup())
>  * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y

Deadlock bug was prevented by stop-gap measure in December commit.

Full resolution patch attached for Startup process waits on buffer pins.

Startup process sets SIGALRM when waiting on a buffer pin. If woken by
alarm we send SIGUSR1 to all backends requesting that they check to see
if they are blocking Startup process. If so, they throw ERROR/FATAL as
for other conflict resolutions. Deadlock stop gap removed.
max_standby_delay = -1 option removed to prevent deadlock.

Reviews welcome, otherwise commit at end of week.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Tuesday 19 January 2010 11:46:24 Simon Riggs wrote:
> On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote:
> > Hot Standby node can freeze when startup process calls
> > LockBufferForCleanup(). This bug can be reproduced by the following
> > procedure.
> >
> > 0. start Hot Standby, with one active node(node A) and one standby
> > node(node B) 1. create table X and table Y in node A
> > 2. insert several rows in table X in node A
> > 3. delete one row from table X in node A
> > 4. begin xact 1 in node A, execute following commands, and leave xact 1
> > open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE
> > 5. wait until WAL's for above actions are applied in node B
> > 6. begin xact 2 in node B, and execute following commands
> > 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X;
> > 6.2 FETCH test_cursor;
> > 6.3 SELECT * FROM table Y;
> > 7. execute VACUUM FREEZE table A in node A
> > 8. commit xact 1 in node A
> >
> > ...then in node B occurs following "deadlock" situation, which is not
> > detected by deadlock check.
> >
> >  * startup process waits for xact 2 to release buffers in table X (in
> >  LockBufferForCleanup()) * xact 2 waits for startup process to release
> >  ACCESS EXCLUSIVE lock in table Y
>
> Deadlock bug was prevented by stop-gap measure in December commit.
>
> Full resolution patch attached for Startup process waits on buffer pins.
>
> Startup process sets SIGALRM when waiting on a buffer pin. If woken by
> alarm we send SIGUSR1 to all backends requesting that they check to see
> if they are blocking Startup process. If so, they throw ERROR/FATAL as
> for other conflict resolutions. Deadlock stop gap removed.
> max_standby_delay = -1 option removed to prevent deadlock.
Wouldnt it be more foolproof to also loop around sending the FATAL? Not that
its likely but...

From HoldingBufferPinThatDelaysRecovery youre calling
GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is
acquiring a spinlock which can also get taken at other places. Its not the
most likely scenario, but it would certainly be annoying to debug.

Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt
so? If not the locking in GetStartupBufferPinWaitBufId and
SetStartupBufferPinWaitBufId shouldnt be needed?

Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout-
>SendRecoveryConflictWithBufferPin->CancelDBBackends

Too tired to look further.

Greetings,
Andres


PS: Sorry for not doing anything on the idle front - I have been burried alive
the last days.


Re: An example of bugs for Hot Standby

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt 
> so?

Er ... what?  I believe there are live platforms with sig_atomic_t = char.
If we're assuming more that's a must-fix.
        regards, tom lane


Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 06:30:28 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would
> > doubt so?
> 
> Er ... what?  I believe there are live platforms with sig_atomic_t = char.
> If we're assuming more that's a must-fix.
So were assuming genereally that a integer cannot be read/written 
atomatically?
Or was that only relating to the actualy signal.h typedef?

Andres


Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote:
> > 
> > Full resolution patch attached for Startup process waits on buffer pins.
> > 
> > Startup process sets SIGALRM when waiting on a buffer pin. If woken by
> > alarm we send SIGUSR1 to all backends requesting that they check to see
> > if they are blocking Startup process. If so, they throw ERROR/FATAL as
> > for other conflict resolutions. Deadlock stop gap removed.
> > max_standby_delay = -1 option removed to prevent deadlock.

> Wouldnt it be more foolproof to also loop around sending the FATAL? Not that 
> its likely but...

More foolproof and much less accurate. The Startup process doesn't know
who is holding the buffer pin that blocks it, so it could not target a
FATAL.

> From HoldingBufferPinThatDelaysRecovery youre calling 
> GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is 
> acquiring a spinlock which can also get taken at other places. Its not the 
> most likely scenario, but it would certainly be annoying to debug.

Spinlock. It isn't held for long in any situation. What problem do you
foresee?

> Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt 
> so? If not the locking in GetStartupBufferPinWaitBufId and 
> SetStartupBufferPinWaitBufId shouldnt be needed? 

I prefer spinlocking.

> Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout-
> >SendRecoveryConflictWithBufferPin->CancelDBBackends

I don't see an issue.

-- Simon Riggs           www.2ndQuadrant.com



Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 06:30:28 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would
> > doubt so?
> 
> Er ... what?  I believe there are live platforms with sig_atomic_t = char.
> If we're assuming more that's a must-fix.
The reason I have asked is that the code is doing things like:
/** Used by backends when they receive a request to check for buffer pin waits.*/
int
GetStartupBufferPinWaitBufId(void)
{int bufid;
/* use volatile pointer to prevent code rearrangement */volatile PROC_HDR *procglobal = ProcGlobal;
SpinLockAcquire(ProcStructLock);
bufid = procglobal->startupBufferPinWaitBufId;
SpinLockRelease(ProcStructLock);
return bufid;
}

or similar things with LWLockAcquire in a signal handler which strikes me as a 
not that good idea. As at least on x86 reading an integer is atomic the above 
spinlock is pointless. My cross arch experience is barely existing, so...

Andres


Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 10:40:10 Simon Riggs wrote:
> On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote:
> > > Full resolution patch attached for Startup process waits on buffer
> > > pins.
> > > 
> > > Startup process sets SIGALRM when waiting on a buffer pin. If woken by
> > > alarm we send SIGUSR1 to all backends requesting that they check to see
> > > if they are blocking Startup process. If so, they throw ERROR/FATAL as
> > > for other conflict resolutions. Deadlock stop gap removed.
> > > max_standby_delay = -1 option removed to prevent deadlock.
> > 
> > Wouldnt it be more foolproof to also loop around sending the FATAL? Not
> > that its likely but...
> 
> More foolproof and much less accurate. The Startup process doesn't know
> who is holding the buffer pin that blocks it, so it could not target a
> FATAL.
> 
> > From HoldingBufferPinThatDelaysRecovery youre calling
> > GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that
> > one is acquiring a spinlock which can also get taken at other places.
> > Its not the most likely scenario, but it would certainly be annoying to
> > debug.
> Spinlock. It isn't held for long in any situation. What problem do you
> foresee?
If any backend is signalled while currently holding the ProcStructLock there 
is a basically unrecoverable deadlock - its not likely but possible.

> > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would
> > doubt so? If not the locking in GetStartupBufferPinWaitBufId and
> > SetStartupBufferPinWaitBufId shouldnt be needed?
> I prefer spinlocking.
Well, its deadlock land taking the same lock inside and outside of a signal 
handler...

Andres


Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote:
> LWLockAcquire

I'm using spinlocks, not lwlocks.

-- Simon Riggs           www.2ndQuadrant.com



Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote:
> On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote:
> > LWLockAcquire
> 
> I'm using spinlocks, not lwlocks.
CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in 
turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock.

Now that case is a bit less dangerous because you would have to interrupt 
yourself to trigger a deadlock there because the code sleeps soon after 
setting up the handler.
If ever two SIGALRM occur consecutive there is a problem. 

Andres


Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Wed, 2010-01-20 at 11:04 +0100, Andres Freund wrote:
> On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote:
> > On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote:
> > > LWLockAcquire
> > 
> > I'm using spinlocks, not lwlocks.
> CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in 
> turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock.

Those are used in similar ways to deadlock detection.

> Now that case is a bit less dangerous because you would have to interrupt 
> yourself to trigger a deadlock there because the code sleeps soon after 
> setting up the handler.
> If ever two SIGALRM occur consecutive there is a problem. 

I'll protect against subsequent calls.

-- Simon Riggs           www.2ndQuadrant.com



Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 11:33:05 Simon Riggs wrote:
> On Wed, 2010-01-20 at 11:04 +0100, Andres Freund wrote:
> > On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote:
> > > On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote:
> > > > LWLockAcquire
> > > 
> > > I'm using spinlocks, not lwlocks.
> > 
> > CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which
> > in turn used by CheckStandbyTimeout triggered by SIGALRM acquires the
> > lwlock.
> 
> Those are used in similar ways to deadlock detection.
But only if 
ImmediateInterruptOK && InterruptHoldoffCount == 0 && CritSectionCount == 0 - 
which is not the case with HoldingBufferPinThatDelaysRecovery.

Andres


Re: An example of bugs for Hot Standby

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On Wednesday 20 January 2010 06:30:28 Tom Lane wrote:
>> Er ... what?  I believe there are live platforms with sig_atomic_t = char.
>> If we're assuming more that's a must-fix.

> The reason I have asked is that the code is doing things like:
> [ grabbing a spinlock to read a single integer ]

Yes, I think we probably actually need that.  The problem is not so
much whether the read is an atomic operation as whether you can rely
on getting an up-to-date value.  On multiprocessors with weak memory
ordering you need some type of "sync" instruction to be sure you will
see a value that was recently written by another processor.  Currently,
we embed such instructions in the spinlock acquire/release code.
There's been some discussion of exposing memory sync independently
of lock acquisition; perhaps that would be enough here, but I haven't
looked at the surrounding logic enough to say.

My complaint at the top was responding to the idea that someone might
be supposing the specific type sig_atomic_t was at least as wide as
int.  That's a different matter altogether.  We do assume in some places
that we can read or write the specific type TransactionId indivisibly,
but we don't try to declare it as sig_atomic_t.

> or similar things with LWLockAcquire in a signal handler

[ grows visibly pale ]  *Please* tell me we are not trying to take
locks in a signal handler.  What happens if it interrupts code that
is already holding that lock?
        regards, tom lane


Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 17:30:04 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Wednesday 20 January 2010 06:30:28 Tom Lane wrote:
> >> Er ... what?  I believe there are live platforms with sig_atomic_t =
> >> char. If we're assuming more that's a must-fix.
> > 
> > The reason I have asked is that the code is doing things like:
> > [ grabbing a spinlock to read a single integer ]
> 
> Yes, I think we probably actually need that.  The problem is not so
> much whether the read is an atomic operation as whether you can rely
> on getting an up-to-date value.  On multiprocessors with weak memory
> ordering you need some type of "sync" instruction to be sure you will
> see a value that was recently written by another processor.  Currently,
> we embed such instructions in the spinlock acquire/release code.
> There's been some discussion of exposing memory sync independently
> of lock acquisition; perhaps that would be enough here, but I haven't
> looked at the surrounding logic enough to say.
I think it should be enough.

I realize its way too late in the cycle for that, but why dont we start using 
some library for easy cross platform atomic ops? I think libatomic or such 
should support the required platforms.

> My complaint at the top was responding to the idea that someone might
> be supposing the specific type sig_atomic_t was at least as wide as
> int.  That's a different matter altogether.  We do assume in some places
> that we can read or write the specific type TransactionId indivisibly,
> but we don't try to declare it as sig_atomic_t.
So we already assume that? Fine.

(yes, the sig_atomic_t was a sidetrack - I had memorized it wrongly as "the 
biggest value that can be read/written atomically which is *clearly* wrong)

> > or similar things with LWLockAcquire in a signal handler
> 
> [ grows visibly pale ]  *Please* tell me we are not trying to take
> locks in a signal handler.  What happens if it interrupts code that
> is already holding that lock?
Yes the patch does that at two places. Thats what I was complaining about and 
what triggered my sig_atomic_t question because of the above explained 
misunderstanding.


Andres


Re: An example of bugs for Hot Standby

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I realize its way too late in the cycle for that, but why dont we start using
> some library for easy cross platform atomic ops?

(1) there probably isn't one that does exactly what we want, works
everywhere, and has the right license;
(2) what actual gain would we get?  We've already done the work.

>> [ grows visibly pale ]  *Please* tell me we are not trying to take
>> locks in a signal handler.  What happens if it interrupts code that
>> is already holding that lock?

> Yes the patch does that at two places.

That's a must-fix.
        regards, tom lane


Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
Hi Tom, Hi Simon,

On Wednesday 20 January 2010 17:59:36 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I realize its way too late in the cycle for that, but why dont we start
> > using some library for easy cross platform atomic ops?
> 
> (1) there probably isn't one that does exactly what we want, works
> everywhere, and has the right license;
> (2) what actual gain would we get?  We've already done the work.
That there might be some other instructions were interested in?
Like really atomic increment?

> >> [ grows visibly pale ]  *Please* tell me we are not trying to take
> >> locks in a signal handler.  What happens if it interrupts code that
> >> is already holding that lock?
> > 
> > Yes the patch does that at two places.
> 
> That's a must-fix.
Its code intended to fix a existing problem not already comitted code. But 
otherwise I definitely agree.

Andres


Synchronization primitives (Was: Re: An example of bugs for Hot Standby)

From
Heikki Linnakangas
Date:
Andres Freund wrote:
> On Wednesday 20 January 2010 17:59:36 Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> I realize its way too late in the cycle for that, but why dont we start
>>> using some library for easy cross platform atomic ops?
>> (1) there probably isn't one that does exactly what we want, works
>> everywhere, and has the right license;
>> (2) what actual gain would we get?  We've already done the work.
> That there might be some other instructions were interested in?
> Like really atomic increment?

This reminds me of something I've been pondering for some time:

Streaming Replication introduces a few places with a polling pattern
like this (in pseudocode):

while()
{ /* Check if variable in shared has advanced beoynd X */ SpinLockAcquire() localvar = sharedvar; SpinLockRelease() if
(localvar> X)   break;
 
 /* Not yet. Sleep pg_usleep(100);
}

For example, startup process polls like that to wait for walreceiver to
write & flush new WAL to be replayed. And in master, walsender polls
like that for new WAL to be generated, so that it can be sent to
standby. Hot standby also has a polling loop where it waits for a
transaction a transaction to die, though I'm not sure if that can be fit
into the same model.

That's OK for asynchronous replication, but unacceptable for synchronous
mode.

It would be nice to have a new synchronization primitive for that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Streaming Replication introduces a few places with a polling pattern
> like this (in pseudocode):

> while()
> {
>   /* Check if variable in shared has advanced beoynd X */
>   SpinLockAcquire()
>   localvar = sharedvar;
>   SpinLockRelease()
>   if (localvar > X)
>     break;

>   /* Not yet. Sleep
>   pg_usleep(100);
> }

I trust there's a CHECK_FOR_INTERRUPTS in there ...

> It would be nice to have a new synchronization primitive for that.

Maybe.  The lock, the variable, the comparison operation, and the sleep
time all seem rather specific to each application.  Not sure that it'd
really buy much to try to turn it into a generic subroutine.
        regards, tom lane


Re: Synchronization primitives (Was: Re: An example of bugs for Hot Standby)

From
Simon Riggs
Date:
On Wed, 2010-01-20 at 20:00 +0200, Heikki Linnakangas wrote:

> Hot standby also has a polling loop where it waits for a
> transaction a transaction to die, though I'm not sure if that can be
> fit into the same model

I prefer that in the context of HS because the Startup process is
waiting for things to die. Given that their death may not be handled
sweetly, I would not wish to rely on that to wake Startup.

In the other two cases you mention all processes are working together
normally and we aren't expecting the other processes to die.

-- Simon Riggs           www.2ndQuadrant.com



Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Wed, 2010-01-20 at 17:40 +0100, Andres Freund wrote:
> > > or similar things with LWLockAcquire in a signal handler
> > 
> > [ grows visibly pale ]  *Please* tell me we are not trying to take
> > locks in a signal handler.  What happens if it interrupts code that
> > is already holding that lock?

> Yes the patch does that at two places.

I think it would be more sensible to discuss specific code and issues,
rather than have general discussions about various horrors.

You've already pointed out that I need to prevent multiple sigalrm
interrupts using boolean flags; I've already said that I would do that.
The use of locks themselves are clearly not a problem, since the
existing sigalrm handler takes LWlocks for deadlock detection. The
problem is just about being called multiple times.

The code in HoldingBufferPinThatDelaysRecovery() also needs protection
against being interrupted multiple times, but we should note that a
second signal of that type is not going to arrive from anywhere inside
the server and requires an explicit user action. The locking isn't
strictly necessary since the value is only read when the only process
that ever writes that value is sleeping on a semaphore. The single
integer value can always be read atomically anyway.

So I will remove the locking in XXXStartupBufferPinWaitBufId(), add in
the booleans and we're done.

-- Simon Riggs           www.2ndQuadrant.com



Re: Synchronization primitives (Was: Re: An example of bugs for Hot Standby)

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Streaming Replication introduces a few places with a polling pattern
>> like this (in pseudocode):
> 
>> while()
>> {
>>   /* Check if variable in shared has advanced beoynd X */
>>   SpinLockAcquire()
>>   localvar = sharedvar;
>>   SpinLockRelease()
>>   if (localvar > X)
>>     break;
> 
>>   /* Not yet. Sleep
>>   pg_usleep(100);
>> }
> 
> I trust there's a CHECK_FOR_INTERRUPTS in there ...
> 
>> It would be nice to have a new synchronization primitive for that.
> 
> Maybe.  The lock, the variable, the comparison operation, and the sleep
> time all seem rather specific to each application.  Not sure that it'd
> really buy much to try to turn it into a generic subroutine.

My point is that we should replace such polling loops with something
non-polling, using wait/signal or semaphores or something. That gets
quite a bit more complex. You'd probably still have the loop, but
instead of pg_usleep() you'd call some new primitive function that waits
until the shared variable changes.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> My point is that we should replace such polling loops with something
> non-polling, using wait/signal or semaphores or something. That gets
> quite a bit more complex. You'd probably still have the loop, but
> instead of pg_usleep() you'd call some new primitive function that waits
> until the shared variable changes.

Maybe someday --- it's certainly not something we need to mess with for
8.5.  As Simon comments, getting it to work nicely in the face of corner
cases (like processes dying unexpectedly) could be a lot of work.
        regards, tom lane


Re: Synchronization primitives (Was: Re: An example of bugs for Hot Standby)

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> My point is that we should replace such polling loops with something
>> non-polling, using wait/signal or semaphores or something. That gets
>> quite a bit more complex. You'd probably still have the loop, but
>> instead of pg_usleep() you'd call some new primitive function that waits
>> until the shared variable changes.
> 
> Maybe someday --- it's certainly not something we need to mess with for
> 8.5.  As Simon comments, getting it to work nicely in the face of corner
> cases (like processes dying unexpectedly) could be a lot of work.

Agreed, polling is good enough for 8.5.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Synchronization primitives (Was: Re: An example of bugs for Hot Standby)

From
David Fetter
Date:
On Wed, Jan 20, 2010 at 09:22:49PM +0200, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> >> My point is that we should replace such polling loops with something
> >> non-polling, using wait/signal or semaphores or something. That gets
> >> quite a bit more complex. You'd probably still have the loop, but
> >> instead of pg_usleep() you'd call some new primitive function that waits
> >> until the shared variable changes.
> > 
> > Maybe someday --- it's certainly not something we need to mess with for
> > 8.5.  As Simon comments, getting it to work nicely in the face of corner
> > cases (like processes dying unexpectedly) could be a lot of work.
> 
> Agreed, polling is good enough for 8.5.

Is this a TODO yet?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


David Fetter <david@fetter.org> writes:
> On Wed, Jan 20, 2010 at 09:22:49PM +0200, Heikki Linnakangas wrote:
>>> My point is that we should replace such polling loops with something
>>> non-polling, using wait/signal or semaphores or something.

> Is this a TODO yet?

It hardly seems concrete enough for a TODO item.
        regards, tom lane


Re: An example of bugs for Hot Standby

From
Hiroyuki Yamada
Date:
>Deadlock bug was prevented by stop-gap measure in December commit.
>
>Full resolution patch attached for Startup process waits on buffer pins.
>
>Startup process sets SIGALRM when waiting on a buffer pin. If woken by
>alarm we send SIGUSR1 to all backends requesting that they check to see
>if they are blocking Startup process. If so, they throw ERROR/FATAL as
>for other conflict resolutions. Deadlock stop gap removed.
>max_standby_delay = -1 option removed to prevent deadlock.
>
>Reviews welcome, otherwise commit at end of week.
>

I think the patch has two problems.
* disable_standby_sig_alarm() does not clear standby_timeout_active flag  when it succeeds in disabling the alarm.
* Assertion check in HoldingBufferPinThatDelaysRecovery() can fail  with following scenario.
  1. Two transactions, xact A and xact B, are running in a HotStandby server.  2. Xact A holds a pin on buffer X.  3.
Startupprocess calls LockBufferForCleanup() for buffer X,     sets ProcGlobal->startupBufferPinWaitBufId = X,     sends
PROCSIG_RECOVERY_CONFLICT_BUFFERPINsignal to both transactions,     and sleeps.  4. Xact A handles the signal,
abortsitself,     releases the pin on buffer X,     and awake startup process.  5. Startup process wakes up     and
setsProcGlobal->startupBufferPinWaitBufId = -1.  6. Xact B handles the signal,     checks
ProcGlobal->startupBufferPinWaitBufId,    and fails in the assertion check in HoldingBufferPinThatDelaysRecovery().
 


regards,

-- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net


Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Thu, 2010-01-21 at 18:01 +0900, Hiroyuki Yamada wrote:

> I think the patch has two problems.
> 
>  * disable_standby_sig_alarm() does not clear standby_timeout_active flag
>    when it succeeds in disabling the alarm.

Ah, thanks.

>  * Assertion check in HoldingBufferPinThatDelaysRecovery() can fail
>    with following scenario.

Agreed. Will remove Assert() because of race conditions.

Thanks,

-- Simon Riggs           www.2ndQuadrant.com



Re: An example of bugs for Hot Standby

From
Simon Riggs
Date:
On Thu, 2010-01-21 at 18:01 +0900, Hiroyuki Yamada wrote:
> >Deadlock bug was prevented by stop-gap measure in December commit.
> >
> >Full resolution patch attached for Startup process waits on buffer pins.
> >
> >Startup process sets SIGALRM when waiting on a buffer pin. If woken by
> >alarm we send SIGUSR1 to all backends requesting that they check to see
> >if they are blocking Startup process. If so, they throw ERROR/FATAL as
> >for other conflict resolutions. Deadlock stop gap removed.
> >max_standby_delay = -1 option removed to prevent deadlock.
> >
> >Reviews welcome, otherwise commit at end of week.
> >
>
> I think the patch has two problems.
>
>  * disable_standby_sig_alarm() does not clear standby_timeout_active flag
>    when it succeeds in disabling the alarm.
>
>  * Assertion check in HoldingBufferPinThatDelaysRecovery() can fail
>    with following scenario.

Updated patch, including changes from Andres and Hiroyuki.

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: An example of bugs for Hot Standby

From
Andres Freund
Date:
On Wednesday 20 January 2010 17:59:36 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I realize its way too late in the cycle for that, but why dont we start
> > using some library for easy cross platform atomic ops?
> 
> (1) there probably isn't one that does exactly what we want, works
> everywhere, and has the right license;
> (2) what actual gain would we get?  We've already done the work.
Another thing would be to have a simple rmb() wmb() instruction...

Andres