Home > mailing lists

Re: [HACKERS] More race conditions in logical replication - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: [HACKERS] More race conditions in logical replication
Date	July 25, 2017 23:42:23
Msg-id	20170725174223.cf2z3venmka4gxta@alvherre.pgsql Whole thread Raw
In response to	Re: [HACKERS] More race conditions in logical replication (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses	Re: [HACKERS] More race conditions in logical replication Re: [HACKERS] More race conditions in logical replication
List	pgsql-hackers

Tree view

Petr Jelinek wrote:
> On 25/07/17 01:33, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> >> I'm back at looking into this again, after a rather exhausting week.  I
> >> think I have a better grasp of what is going on in this code now,
> > 
> > Here's an updated patch, which I intend to push tomorrow.
> 
> I think the ConditionVariablePrepareToSleep() call in
> ReplicationSlotAcquire() needs to be done inside the mutex lock
> otherwise there is tiny race condition where the process which has slot
> acquired might release the slot between the mutex unlock and the
> ConditionVariablePrepareToSleep() call which means we'll never get
> signaled and ConditionVariableSleep() will wait forever.

Hmm, yeah, that's not good.  However, I didn't like the idea of putting
it inside the locked area, as it's too much code.  Instead I added just
before acquiring the spinlock.  We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

As I see it, we need to backpatch at least parts of this patch.  I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here.  Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

Hmm.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 25 July 2017, 23:41:34
Subject: Re: [HACKERS] Testlib.pm vs msys

From: "Joshua D. Drake"
Date: 25 July 2017, 23:44:55
Subject: Re: [HACKERS] Create language syntax is not proper in pg_dumpall andnot working using pg_upgrade

Re: [HACKERS] More race conditions in logical replication - Mailing list pgsql-hackers

Previous

Next