Thread: [HACKERS] More race conditions in logical replication

[HACKERS] More race conditions in logical replication

From
Tom Lane
Date:
I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR:  could not drop the replication slot "tap_sub" on publisher
DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP
SUBSCRIPTIONtap_sub' at
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1198. 

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit).  It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

I'm not particularly interested in fixing this myself, so I'm just
going to add it to the open items list.

            regards, tom lane

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..78cd416 100644
*** a/src/backend/replication/slot.c
--- b/src/backend/replication/slot.c
*************** ReplicationSlotAcquire(const char *name)
*** 358,364 ****
      if (active_pid != MyProcPid)
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_IN_USE),
!                  errmsg("replication slot \"%s\" is active for PID %d",
                          name, active_pid)));

      /* We made this slot active, so it's ours now. */
--- 358,364 ----
      if (active_pid != MyProcPid)
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_IN_USE),
!                  errmsg("replication slot \"%s\" is active for PID %d (acq)",
                          name, active_pid)));

      /* We made this slot active, so it's ours now. */
*************** ReplicationSlotRelease(void)
*** 391,396 ****
--- 391,398 ----
           * Mark persistent slot inactive.  We're not freeing it, just
           * disconnecting.
           */
+         pg_usleep(100000);
+         elog(LOG, "ReplicationSlotRelease: unmarking persistent slot");
          SpinLockAcquire(&slot->mutex);
          slot->active_pid = 0;
          SpinLockRelease(&slot->mutex);
*************** ReplicationSlotDropPtr(ReplicationSlot *
*** 523,528 ****
--- 525,532 ----
      {
          bool        fail_softly = slot->data.persistency != RS_PERSISTENT;

+         pg_usleep(100000);
+         elog(LOG, "ReplicationSlotDropPtr: unmarking slot after rename fail");
          SpinLockAcquire(&slot->mutex);
          slot->active_pid = 0;
          SpinLockRelease(&slot->mutex);
*************** ReplicationSlotDropPtr(ReplicationSlot *
*** 540,545 ****
--- 544,551 ----
       * and nobody can be attached to this slot and thus access it without
       * scanning the array.
       */
+     pg_usleep(100000);
+     elog(LOG, "ReplicationSlotDropPtr: unmarking slot");
      LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
      slot->active_pid = 0;
      slot->in_use = false;
*************** restart:
*** 876,882 ****
          if (active_pid)
              ereport(ERROR,
                      (errcode(ERRCODE_OBJECT_IN_USE),
!                      errmsg("replication slot \"%s\" is active for PID %d",
                              slotname, active_pid)));

          /*
--- 882,888 ----
          if (active_pid)
              ereport(ERROR,
                      (errcode(ERRCODE_OBJECT_IN_USE),
!                      errmsg("replication slot \"%s\" is active for PID %d (drop)",
                              slotname, active_pid)));

          /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More race conditions in logical replication

From
Noah Misch
Date:
On Sun, Jul 02, 2017 at 07:54:48PM -0400, Tom Lane wrote:
> I noticed a recent failure that looked suspiciously like a race condition:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07
> 
> The critical bit in the log file is
> 
> error running SQL: 'psql:<stdin>:1: ERROR:  could not drop the replication slot "tap_sub" on publisher
> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 3866790'
> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'DROPSUBSCRIPTION tap_sub' at
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1198.
 
> 
> After poking at it a bit, I found that I can cause several different
> failures of this ilk in the subscription tests by injecting delays at
> the points where a slot's active_pid is about to be cleared, as in the
> attached patch (which also adds some extra printouts for debugging
> purposes; none of that is meant for commit).  It seems clear that there
> is inadequate interlocking going on when we kill and restart a logical
> rep worker: we're trying to start a new one before the old one has
> gotten out of the slot.
> 
> I'm not particularly interested in fixing this myself, so I'm just
> going to add it to the open items list.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Noah Misch wrote:

> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.

I volunteer to own this item.  My next update is going to be on or
before Friday 7th at 19:00 Chilean time, though I don't think I can have
this ready before then.  I expect a fix for this to miss next week's
beta release, but I'll try to get it done sooner.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Petr Jelinek
Date:
On 03/07/17 01:54, Tom Lane wrote:
> I noticed a recent failure that looked suspiciously like a race condition:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07
> 
> The critical bit in the log file is
> 
> error running SQL: 'psql:<stdin>:1: ERROR:  could not drop the replication slot "tap_sub" on publisher
> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 3866790'
> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'DROPSUBSCRIPTION tap_sub' at
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1198.
 
> 
> After poking at it a bit, I found that I can cause several different
> failures of this ilk in the subscription tests by injecting delays at
> the points where a slot's active_pid is about to be cleared, as in the
> attached patch (which also adds some extra printouts for debugging
> purposes; none of that is meant for commit).  It seems clear that there
> is inadequate interlocking going on when we kill and restart a logical
> rep worker: we're trying to start a new one before the old one has
> gotten out of the slot.
> 

Thanks for the test case.

It's not actually that we start new worker fast. It's that we try to
drop the slot right after worker process was killed but if the code that
clears slot's active_pid takes too long, it still looks like it's being
used. I am quite sure it's possible to make this happen for physical
replication as well when using slots.

This is not something that can be solved by locking on subscriber. ISTM
we need to make pg_drop_replication_slot behave more nicely, like making
it wait for the slot to become available (either by default or as an
option).

I'll have to think about how to do it without rewriting half of
replication slots or reimplementing lock queue though because the
replication slots don't use normal catalog access so there is no object
locking with wait queue. We could use some latch wait with small timeout
but that seems ugly as that function can be called by user without
having dropped the slot before so the wait can be quite long (as in
"forever").

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] More race conditions in logical replication

From
Petr Jelinek
Date:
On 06/07/17 17:33, Petr Jelinek wrote:
> On 03/07/17 01:54, Tom Lane wrote:
>> I noticed a recent failure that looked suspiciously like a race condition:
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07
>>
>> The critical bit in the log file is
>>
>> error running SQL: 'psql:<stdin>:1: ERROR:  could not drop the replication slot "tap_sub" on publisher
>> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 3866790'
>> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'DROPSUBSCRIPTION tap_sub' at
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1198.
 
>>
>> After poking at it a bit, I found that I can cause several different
>> failures of this ilk in the subscription tests by injecting delays at
>> the points where a slot's active_pid is about to be cleared, as in the
>> attached patch (which also adds some extra printouts for debugging
>> purposes; none of that is meant for commit).  It seems clear that there
>> is inadequate interlocking going on when we kill and restart a logical
>> rep worker: we're trying to start a new one before the old one has
>> gotten out of the slot.
>>
> 
> Thanks for the test case.
> 
> It's not actually that we start new worker fast. It's that we try to
> drop the slot right after worker process was killed but if the code that
> clears slot's active_pid takes too long, it still looks like it's being
> used. I am quite sure it's possible to make this happen for physical
> replication as well when using slots.
> 
> This is not something that can be solved by locking on subscriber. ISTM
> we need to make pg_drop_replication_slot behave more nicely, like making
> it wait for the slot to become available (either by default or as an
> option).
> 
> I'll have to think about how to do it without rewriting half of
> replication slots or reimplementing lock queue though because the
> replication slots don't use normal catalog access so there is no object
> locking with wait queue. We could use some latch wait with small timeout
> but that seems ugly as that function can be called by user without
> having dropped the slot before so the wait can be quite long (as in
> "forever").
> 

Naive fix would be something like attached. But as I said, it's not
exactly pretty.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Petr Jelinek
Date:
On 06/07/17 18:20, Petr Jelinek wrote:
> On 06/07/17 17:33, Petr Jelinek wrote:
>> On 03/07/17 01:54, Tom Lane wrote:
>>> I noticed a recent failure that looked suspiciously like a race condition:
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07
>>>
>>> The critical bit in the log file is
>>>
>>> error running SQL: 'psql:<stdin>:1: ERROR:  could not drop the replication slot "tap_sub" on publisher
>>> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 3866790'
>>> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'DROPSUBSCRIPTION tap_sub' at
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1198.
 
>>>
>>> After poking at it a bit, I found that I can cause several different
>>> failures of this ilk in the subscription tests by injecting delays at
>>> the points where a slot's active_pid is about to be cleared, as in the
>>> attached patch (which also adds some extra printouts for debugging
>>> purposes; none of that is meant for commit).  It seems clear that there
>>> is inadequate interlocking going on when we kill and restart a logical
>>> rep worker: we're trying to start a new one before the old one has
>>> gotten out of the slot.
>>>
>>
>> Thanks for the test case.
>>
>> It's not actually that we start new worker fast. It's that we try to
>> drop the slot right after worker process was killed but if the code that
>> clears slot's active_pid takes too long, it still looks like it's being
>> used. I am quite sure it's possible to make this happen for physical
>> replication as well when using slots.
>>
>> This is not something that can be solved by locking on subscriber. ISTM
>> we need to make pg_drop_replication_slot behave more nicely, like making
>> it wait for the slot to become available (either by default or as an
>> option).
>>
>> I'll have to think about how to do it without rewriting half of
>> replication slots or reimplementing lock queue though because the
>> replication slots don't use normal catalog access so there is no object
>> locking with wait queue. We could use some latch wait with small timeout
>> but that seems ugly as that function can be called by user without
>> having dropped the slot before so the wait can be quite long (as in
>> "forever").
>>
> 
> Naive fix would be something like attached. But as I said, it's not
> exactly pretty.
> 

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

It's not backportable however, I am not sure how big problem that is
considering the lack of complaints until now (maybe we could backport
using the ugly timeout version?).

The attached patch is a prototype of such solution and there are some
race conditions (variable can get signaled before the waiting process
starts sleeping for it). I am mainly sending it to get feedback on the
approach.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Petr Jelinek wrote:

> So best idea I could come up with is to make use of the new condition
> variable API. That lets us wait for variable which can be set per slot.
> 
> It's not backportable however, I am not sure how big problem that is
> considering the lack of complaints until now (maybe we could backport
> using the ugly timeout version?).
> 
> The attached patch is a prototype of such solution and there are some
> race conditions (variable can get signaled before the waiting process
> starts sleeping for it). I am mainly sending it to get feedback on the
> approach.

This one looks a lot better than the original one.

I wonder if it's possible to do this using lwlocks instead of condition
variables (similar to how we do the "wait for IO" thing both for slots
and buffers).  We could backport that easily ...

I'll next update this on or before Monday 10th at 19:00 CLT.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I'll next update this on or before Monday 10th at 19:00 CLT.

Schedule note --- we'll be wrapping beta2 on Monday, a couple hours
before that.  Although it'd be great to have this issue fixed before
beta2, jamming in a patch just a few hours before wrap is probably
not prudent.  If you can't get it done over the weekend, I'd counsel
holding off till after beta2 is tagged.
        regards, tom lane



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> I'll next update this on or before Monday 10th at 19:00 CLT.

I couldn't get to this today as I wanted.  Next update on Wednesday
12th, same time.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Petr Jelinek wrote:

> So best idea I could come up with is to make use of the new condition
> variable API. That lets us wait for variable which can be set per slot.

Here's a cleaned up version of that patch, which I intend to get in the
tree tomorrow.  I verified that this allows the subscription tests to
pass with Tom's sleep additions.

I noticed one point where we're reading the active_pid without locking;
marked it with an XXX comment.  Not yet sure whether this is a bug or
not.


I noticed something funny in CREATE_REPLICATION; apparently we first
create a slot and set it active, then we activate it by name.  With the
current coding it seems to work fine, because we're careful to make
activation idempotent, but it looks convoluted.  I don't think this is
new, though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Noah Misch
Date:
On Wed, Jul 12, 2017 at 06:48:28PM -0400, Alvaro Herrera wrote:
> Petr Jelinek wrote:
> 
> > So best idea I could come up with is to make use of the new condition
> > variable API. That lets us wait for variable which can be set per slot.
> 
> Here's a cleaned up version of that patch, which I intend to get in the
> tree tomorrow.  I verified that this allows the subscription tests to
> pass with Tom's sleep additions.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
After going over this a bunch more times, I found other problems.  For
example, I noticed that if I create a temporary slot in one session,
then another session would rightly go to sleep if it tries to drop that
slot.  But the slot goes away when the first session disconnects, so I'd
expect the sleeping session to get a signal and wake up, but that
doesn't happen.

I'm going to continue to look at this and report back Tuesday 18th.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> After going over this a bunch more times, I found other problems.  For
> example, I noticed that if I create a temporary slot in one session,
> then another session would rightly go to sleep if it tries to drop that
> slot.  But the slot goes away when the first session disconnects, so I'd
> expect the sleeping session to get a signal and wake up, but that
> doesn't happen.
> 
> I'm going to continue to look at this and report back Tuesday 18th.

I got tangled up in a really tough problem this week, so I won't have
time to work on this for a couple of days.  I can next update tomorrow
19:00 CLT (probably not with a final fix yet, though).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
I'm back at looking into this again, after a rather exhausting week.  I
think I have a better grasp of what is going on in this code now, and it
appears to me that we should change the locking so that active_pid is
protected by ReplicationSlotControlLock instead of the slot's spinlock;
but I haven't analyzed the idea fully yet and I don't have the patch
done yet either.  I'm going to report, hopefully with a fix committed
this time, on Monday at 19:00 CLT.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> I'm back at looking into this again, after a rather exhausting week.  I
> think I have a better grasp of what is going on in this code now,

Here's an updated patch, which I intend to push tomorrow.

> and it
> appears to me that we should change the locking so that active_pid is
> protected by ReplicationSlotControlLock instead of the slot's spinlock;
> but I haven't analyzed the idea fully yet and I don't have the patch
> done yet either.

This doesn't seem to be a good idea actually.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Petr Jelinek
Date:
On 25/07/17 01:33, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> I'm back at looking into this again, after a rather exhausting week.  I
>> think I have a better grasp of what is going on in this code now,
> 
> Here's an updated patch, which I intend to push tomorrow.
> 

I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.

Otherwise the patch looks good to me.

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 25/07/17 01:33, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> >> I'm back at looking into this again, after a rather exhausting week.  I
> >> think I have a better grasp of what is going on in this code now,
> > 
> > Here's an updated patch, which I intend to push tomorrow.
> 
> I think the ConditionVariablePrepareToSleep() call in
> ReplicationSlotAcquire() needs to be done inside the mutex lock
> otherwise there is tiny race condition where the process which has slot
> acquired might release the slot between the mutex unlock and the
> ConditionVariablePrepareToSleep() call which means we'll never get
> signaled and ConditionVariableSleep() will wait forever.

Hmm, yeah, that's not good.  However, I didn't like the idea of putting
it inside the locked area, as it's too much code.  Instead I added just
before acquiring the spinlock.  We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

As I see it, we need to backpatch at least parts of this patch.  I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here.  Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

Hmm.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Robert Haas
Date:
On Tue, Jul 25, 2017 at 1:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> As I see it, we need to backpatch at least parts of this patch.  I've
> received reports that in earlier branches pglogical and BDR can
> sometimes leave slots behind when removing nodes, and I have a hunch
> that it can be explained by the bugs being fixed here.  Now we cannot
> use condition variables in back-branches, so we'll need to figure out
> how to actually do it ...

If all you had to back-patch was the condition variable code itself,
that might not really be all that bad, but it depends on the
WaitEventSet stuff, which I think is far too dangerous to back-patch.
However, you might be able to create a dumber, slower version that
only uses WaitLatch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More race conditions in logical replication

From
Robert Haas
Date:
On Tue, Jul 25, 2017 at 5:47 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

But if you don't call ConditionVariablePrepareToSleep() before calling
ConditionVariableSleep(), then the first call to the latter will call
the former and return without doing anything else.  So I don't see how
this can ever go wrong if you're using these primitives as documented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Hmm, yeah, that's not good.  However, I didn't like the idea of putting
> it inside the locked area, as it's too much code.  Instead I added just
> before acquiring the spinlock.  We cancel the sleep unconditionally
> later on if we didn't need to sleep after all.

I just noticed that Jacana failed again in the subscription test, and it
looks like it's related to this.  I'll take a look tomorrow if no one
beats me to it.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > Hmm, yeah, that's not good.  However, I didn't like the idea of putting
> > it inside the locked area, as it's too much code.  Instead I added just
> > before acquiring the spinlock.  We cancel the sleep unconditionally
> > later on if we didn't need to sleep after all.
> 
> I just noticed that Jacana failed again in the subscription test, and it
> looks like it's related to this.  I'll take a look tomorrow if no one
> beats me to it.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54

Ahh, I misread the message.  This one is actually about the replication
*origin* being still active, not the replication *slot*.  I suppose
origins need the same treatment as we just did for slots.  Anybody wants
to volunteer a patch?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Alvaro Herrera wrote:

> > I just noticed that Jacana failed again in the subscription test, and it
> > looks like it's related to this.  I'll take a look tomorrow if no one
> > beats me to it.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54
> 
> Ahh, I misread the message.  This one is actually about the replication
> *origin* being still active, not the replication *slot*.  I suppose
> origins need the same treatment as we just did for slots.  Anybody wants
> to volunteer a patch?

So here's a patch.  I was able to reproduce the issue locally by adding
a couple of sleeps, just like Tom did in the case of replication slots.
With this patch it seems the problem is solved.  The mechanism is the
same as Petr used to fix replication origins -- if an origin is busy,
sleep on the CV until someone else signals us; and each time the
acquirer PID changes, broadcast a signal.  Like before, this is likely
to be a problem in older branches too, but we have no CVs to backpatch
this.

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus.  We probably
need something new here.

I found four instances of this problem in buildfarm,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-28%2021%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-31%2007%3A03%3A20
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-08-04%2004%3A34%3A04

Jacana only stopped having it because it broke for unrelated reasons.  I
bet we'll see another failure soon ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Robert Haas
Date:
On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
> event here (and in the replication slot case) is bogus.  We probably
> need something new here.

Yeah, if you're adding a new wait point, you should add document a new
constant in the appropriate section, probably something under
PG_WAIT_IPC in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More race conditions in logical replication

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
> > event here (and in the replication slot case) is bogus.  We probably
> > need something new here.
> 
> Yeah, if you're adding a new wait point, you should add document a new
> constant in the appropriate section, probably something under
> PG_WAIT_IPC in this case.

Here's a patch.  It turned to be a bit larger than I initially expected.

Wait events are a maintainability fail IMO.  I think we need to rethink
this stuff; using generated files from a single source containing the C
symbol, text name and doc blurb sounds better.  That can wait for pg11
though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Michael Paquier
Date:
On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
>> > event here (and in the replication slot case) is bogus.  We probably
>> > need something new here.
>>
>> Yeah, if you're adding a new wait point, you should add document a new
>> constant in the appropriate section, probably something under
>> PG_WAIT_IPC in this case.
>
> Here's a patch.  It turned to be a bit larger than I initially expected.

Álvaro, 030273b7 did not get things completely right. A couple of wait
events have been added in the docs for sections Client, IPC and
Activity but it forgot to update the counters for morerows . Attached
is a patch to fix all that.
--
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Michael Paquier
Date:
On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Here's a patch.  It turned to be a bit larger than I initially expected.
>
> Álvaro, 030273b7 did not get things completely right. A couple of wait
> events have been added in the docs for sections Client, IPC and
> Activity but it forgot to update the counters for morerows . Attached
> is a patch to fix all that.

As the documentation format is weird because of this commit, I am
adding an open item dedicated to that. Look for example at
WalSenderWaitForWAL in
https://www.postgresql.org/docs/devel/static/monitoring-stats.html.
While looking again, I have noticed that one line for the section of
LWLock is weird, so attached is an updated patch.

Thanks,
--
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] More race conditions in logical replication

From
Noah Misch
Date:
On Sat, Aug 12, 2017 at 05:38:29PM +0900, Michael Paquier wrote:
> On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >> Here's a patch.  It turned to be a bit larger than I initially expected.
> >
> > Álvaro, 030273b7 did not get things completely right. A couple of wait
> > events have been added in the docs for sections Client, IPC and
> > Activity but it forgot to update the counters for morerows . Attached
> > is a patch to fix all that.
> 
> As the documentation format is weird because of this commit, I am
> adding an open item dedicated to that. Look for example at
> WalSenderWaitForWAL in
> https://www.postgresql.org/docs/devel/static/monitoring-stats.html.
> While looking again, I have noticed that one line for the section of
> LWLock is weird, so attached is an updated patch.

Committed.  These counts broke three times in the v10 release cycle.  It's too
bad this defect doesn't cause an error when building the docs.



Re: [HACKERS] More race conditions in logical replication

From
Michael Paquier
Date:
On Sun, Aug 13, 2017 at 10:25 AM, Noah Misch <noah@leadboat.com> wrote:
> Committed.  These counts broke three times in the v10 release cycle.  It's too
> bad this defect doesn't cause an error when building the docs.

That's another argument for generating the table dynamically. Thanks
for the commit.
-- 
Michael