Thread: [HACKERS] Race conditions with WAL sender PID lookups

[HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

11 May 2017, 04:48:39

Hi all,

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

Any thoughts about the patch attached to make things more consistent?
It seems to me that having some safeguards would be nice for
robustness.

That's an old bug, so I am adding that to the next CF.
Thanks,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

walsnd-pid-races.patch

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Masahiko Sawada

Date:

12 May 2017, 05:44:19

On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hi all,
>
> I had my eyes on the WAL sender code this morning, and I have noticed
> that walsender.c is not completely consistent with the PID lookups it
> does in walsender.c. In two code paths, the PID value is checked
> without holding the WAL sender spin lock (WalSndRqstFileReload and
> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> doing for ages.
>
> Any thoughts about the patch attached to make things more consistent?
> It seems to me that having some safeguards would be nice for
> robustness.

+1. I think this is a sensible change.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Kyotaro HORIGUCHI

Date:

15 May 2017, 06:25:45

At Fri, 12 May 2017 11:44:19 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA15xsu1gbOH=L1XU7g7zuKk1UACtOz+-mqOwP1_xBC_g@mail.gmail.com>
> On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > Hi all,
> >
> > I had my eyes on the WAL sender code this morning, and I have noticed
> > that walsender.c is not completely consistent with the PID lookups it
> > does in walsender.c. In two code paths, the PID value is checked
> > without holding the WAL sender spin lock (WalSndRqstFileReload and
> > pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> > what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> > doing for ages.
> >
> > Any thoughts about the patch attached to make things more consistent?
> > It seems to me that having some safeguards would be nice for
> > robustness.
> 
> +1. I think this is a sensible change.

It intends to avoid exccesive locking during looking up stats
values. But we don't have so much vacant WanSnd slots in a
reasonable setup. Thus it seems reasonable to read the pid value
within the lock section since it adds practically no additional
cost. pg_stat_get_wal_receiver seems to need the same amendment
since the code is a parallel to that of wal receiver.

Or, if we were too sensitive to such locks for nothing, we could
use double-checked locking but I don't think we are so here.

In short, +1 too and walreceiver needs the same amendment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Robert Haas

Date:

16 May 2017, 18:14:27

On Wed, May 10, 2017 at 9:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I had my eyes on the WAL sender code this morning, and I have noticed
> that walsender.c is not completely consistent with the PID lookups it
> does in walsender.c. In two code paths, the PID value is checked
> without holding the WAL sender spin lock (WalSndRqstFileReload and
> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> doing for ages.

Well, it's probably worth changing for consistency, but I'm not sure
that it rises to the level of "a very bad idea".  It actually seems
almost entirely harmless.  Spuriously setting the needreload flag on a
just-deceased WAL sender will just result in some future WAL sender
doing a bit of unnecessary work, but I don't think it breaks anything
and the probability is vanishingly low.  The other change could result
a bogus 0 PID in pg_stat_get_wal_senders output, but I bet you
couldn't reproduce that more than once in a blue moon even with a test
rig designed to provoke it, and if it does happen it isn't really
anything more than a trivial annoyance.

So I'm in favor of committing this and maybe even back-patching it,
but I also don't think it's a big deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

18 May 2017, 05:25:55

On Wed, May 17, 2017 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, it's probably worth changing for consistency, but I'm not sure
> that it rises to the level of "a very bad idea".  It actually seems
> almost entirely harmless.  Spuriously setting the needreload flag on a
> just-deceased WAL sender will just result in some future WAL sender
> doing a bit of unnecessary work, but I don't think it breaks anything
> and the probability is vanishingly low.  The other change could result
> a bogus 0 PID in pg_stat_get_wal_senders output, but I bet you
> couldn't reproduce that more than once in a blue moon even with a test
> rig designed to provoke it, and if it does happen it isn't really
> anything more than a trivial annoyance.

Well, the window is very low, so only tests with precisely taken
breakpoints would show problems.

> So I'm in favor of committing this and maybe even back-patching it,
> but I also don't think it's a big deal.

Thanks. I would not mind if this is seen as a HEAD-only improvement.
-- 
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Thomas Munro

Date:

18 May 2017, 07:43:04

On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I had my eyes on the WAL sender code this morning, and I have noticed
> that walsender.c is not completely consistent with the PID lookups it
> does in walsender.c. In two code paths, the PID value is checked
> without holding the WAL sender spin lock (WalSndRqstFileReload and
> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> doing for ages.

There is also code that accesses shared walsender state without
spinlocks over in syncrep.c.  I think that file could use a few words
of explanation for why it's OK to access pid, state and flush without
synchronisation.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

19 May 2017, 05:05:45

On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I had my eyes on the WAL sender code this morning, and I have noticed
>> that walsender.c is not completely consistent with the PID lookups it
>> does in walsender.c. In two code paths, the PID value is checked
>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>> doing for ages.
>
> There is also code that accesses shared walsender state without
> spinlocks over in syncrep.c.  I think that file could use a few words
> of explanation for why it's OK to access pid, state and flush without
> synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

walsnd-pid-races-v2.patch

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Masahiko Sawada

Date:

19 May 2017, 09:01:24

On Fri, May 19, 2017 at 11:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I had my eyes on the WAL sender code this morning, and I have noticed
>>> that walsender.c is not completely consistent with the PID lookups it
>>> does in walsender.c. In two code paths, the PID value is checked
>>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>>> doing for ages.
>>
>> There is also code that accesses shared walsender state without
>> spinlocks over in syncrep.c.  I think that file could use a few words
>> of explanation for why it's OK to access pid, state and flush without
>> synchronisation.
>
> Yes, that is read during the quorum and priority sync evaluation.
> Except sync_standby_priority, all the other variables should be
> protected using the spin lock of the WAL sender. walsender_private.h
> is clear regarding that. So the current coding is inconsistent even
> there. Attached is an updated patch.

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

20 May 2017, 15:40:57

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
> similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

walsnd-pid-races-v3.patch

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Erik Rijkers

Date:

21 May 2017, 07:37:26

On 2017-05-20 14:40, Michael Paquier wrote:
> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada 
> <sawada.mshk@gmail.com> wrote:
>> Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
>> similar fix.
> 
> Actually, now that I look at it, ready_to_display should as well be
> protected by the lock of the WAL receiver, so it is incorrectly placed
> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
> is lazy as well, and that's new in 10, so we have an open item here
> for both of them. And I am the author for both things. No issues
> spotted in walreceiverfuncs.c after review.
> 
> I am adding an open item so as both issues are fixed in PG10. With the
> WAL sender part, I think that this should be a group shot.
> 
> So what do you think about the attached?

> [walsnd-pid-races-v3.patch]

With this patch on current master my logical replication tests 
(pgbench-over-logical-replication) run without errors for the first time 
in many days (even weeks).

I'll do still more and longer tests but I have gathered already a long 
streak of successful runs since you posted the patch so I am getting 
convinced this patch is solved the problem that I was experiencing.

Pity it didn't make the beta.

thanks,

Erik Rijkers

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Erik Rijkers

Date:

21 May 2017, 09:19:34

On 2017-05-21 06:37, Erik Rijkers wrote:
> On 2017-05-20 14:40, Michael Paquier wrote:
>> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada 
>> <sawada.mshk@gmail.com> wrote:
>>> Also, as Horiguchi-san pointed out earlier, walreceiver seems need 
>>> the
>>> similar fix.
>> 
>> Actually, now that I look at it, ready_to_display should as well be
>> protected by the lock of the WAL receiver, so it is incorrectly placed
>> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
>> is lazy as well, and that's new in 10, so we have an open item here
>> for both of them. And I am the author for both things. No issues
>> spotted in walreceiverfuncs.c after review.
>> 
>> I am adding an open item so as both issues are fixed in PG10. With the
>> WAL sender part, I think that this should be a group shot.
>> 
>> So what do you think about the attached?
> 
>> [walsnd-pid-races-v3.patch]
> 
> 
> With this patch on current master my logical replication tests
> (pgbench-over-logical-replication) run without errors for the first
> time in many days (even weeks).

Unfortunately, just now another logical-replication failure occurred.  
The same as I have seen all along:

The symptom: after starting logical replication, there are no rows in 
pg_stat_replication and in the replica-log logical replication complains 
about max_replication_slots being too low. (from previous experience I 
know that making max_replication_slots higher does indeed 'help', but 
only until the next (same) error occurs, with renewed (same) complaint).

Also from previous experience of this failed state I know that it can be 
'cleaned up' by
manually emptying these tables:  delete from pg_subscription_rel;  delete from pg_subscription;  delete from
pg_replication_origin;
Then it becomes possible to start a new subscription without the above 
symptoms.

I'll do some more testing and hopefully get some information that's less 
vague...

Erik Rijkers

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

30 May 2017, 04:38:08

On Sat, May 20, 2017 at 09:40:57PM +0900, Michael Paquier wrote:
> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
> > similar fix.
> 
> Actually, now that I look at it, ready_to_display should as well be
> protected by the lock of the WAL receiver, so it is incorrectly placed
> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
> is lazy as well, and that's new in 10, so we have an open item here
> for both of them. And I am the author for both things. No issues
> spotted in walreceiverfuncs.c after review.
> 
> I am adding an open item so as both issues are fixed in PG10. With the
> WAL sender part, I think that this should be a group shot.
> 
> So what do you think about the attached?

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Peter Eisentraut

Date:

30 May 2017, 21:09:14

On 5/29/17 21:38, Noah Misch wrote:
>> Actually, now that I look at it, ready_to_display should as well be
>> protected by the lock of the WAL receiver, so it is incorrectly placed
>> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
>> is lazy as well, and that's new in 10, so we have an open item here
>> for both of them. And I am the author for both things. No issues
>> spotted in walreceiverfuncs.c after review.
>>
>> I am adding an open item so as both issues are fixed in PG10. With the
>> WAL sender part, I think that this should be a group shot.
>>
>> So what do you think about the attached?
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.

I don't think this is my item.  Most of the behavior is old, and
pg_stat_get_wal_receiver() is from commit
b1a9bad9e744857291c7d5516080527da8219854.

(The logical replication equivalent of that is
pg_stat_get_subscription(), which does appear to use appropriate locking.)

I would appreciate if another committer can take the lead on this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

30 May 2017, 21:13:37

On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I don't think this is my item.  Most of the behavior is old, and
> pg_stat_get_wal_receiver() is from commit
> b1a9bad9e744857291c7d5516080527da8219854.
>
> I would appreciate if another committer can take the lead on this.

Those things are on Alvaro's plate for the WAL receiver portion, and I
was the author of those patches. The WAL sender portion is older
though, but it seems crazy to me to not fix both things at the same
time per their similarities.
-- 
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Thomas Munro

Date:

31 May 2017, 01:23:56

On Fri, May 19, 2017 at 2:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I had my eyes on the WAL sender code this morning, and I have noticed
>>> that walsender.c is not completely consistent with the PID lookups it
>>> does in walsender.c. In two code paths, the PID value is checked
>>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>>> doing for ages.
>>
>> There is also code that accesses shared walsender state without
>> spinlocks over in syncrep.c.  I think that file could use a few words
>> of explanation for why it's OK to access pid, state and flush without
>> synchronisation.
>
> Yes, that is read during the quorum and priority sync evaluation.
> Except sync_standby_priority, all the other variables should be
> protected using the spin lock of the WAL sender. walsender_private.h
> is clear regarding that. So the current coding is inconsistent even
> there. Attached is an updated patch.

I don't claim that that stuff is wrong, just that it would be good to
hear an analysis.  I think the question is: is there a way for syncrep
to hang because of a perfectly timed walsender transition, however
vanishingly unlikely?  I'm thinking of something like: syncrep skips a
walsender slot because it's looking at arbitrarily old 'pid' from
before a walsender connected, or gets a torn read of 'flush' that
comes out as 0 but was actually non-0.

Incidentally, I suspect that a couple of places where 'volatile' is
used it's superfluous (accessing things protected by an LWLock that is
held).

I don't see any of this as 'open item' material, it's interesting to
look into but it's preexisting code.  As for unlocked reads used for
pg_stat_X views, it seems well established that we're OK with that (at
least for things that the project has decided can be read atomically,
to wit aligned 32 bit values).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

02 June 2017, 08:21:28

On Sun, May 21, 2017 at 08:19:34AM +0200, Erik Rijkers wrote:
> On 2017-05-21 06:37, Erik Rijkers wrote:
> >With this patch on current master my logical replication tests
> >(pgbench-over-logical-replication) run without errors for the first
> >time in many days (even weeks).
> 
> Unfortunately, just now another logical-replication failure occurred.  The
> same as I have seen all along:

This creates a rebuttable presumption of logical replication being the cause
of this open item.  (I am not stating an opinion on whether this rebuttable
presumption is true or is false.)  As long as that stands and Alvaro has not
explicitly requested ownership of this open item, Peter owns it.

On Tue, May 30, 2017 at 11:13:37AM -0700, Michael Paquier wrote:
> On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > I don't think this is my item.  Most of the behavior is old, and
> > pg_stat_get_wal_receiver() is from commit
> > b1a9bad9e744857291c7d5516080527da8219854.
> >
> > I would appreciate if another committer can take the lead on this.
> 
> Those things are on Alvaro's plate for the WAL receiver portion, and I
> was the author of those patches. The WAL sender portion is older
> though, but it seems crazy to me to not fix both things at the same
> time per their similarities.

As a 9.6 commit, b1a9bad cannot be the cause of a v10 open item.  If a v10
commit expanded the consequences of a pre-existing bug, the committer of that
v10 work owns this open item.  If the bug's consequences are the same in v9.6
and v10, this is ineligible to be an open item.  Which applies?

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

02 June 2017, 09:11:24

On Fri, Jun 2, 2017 at 2:21 PM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, May 30, 2017 at 11:13:37AM -0700, Michael Paquier wrote:
>> On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>> > I don't think this is my item.  Most of the behavior is old, and
>> > pg_stat_get_wal_receiver() is from commit
>> > b1a9bad9e744857291c7d5516080527da8219854.
>> >
>> > I would appreciate if another committer can take the lead on this.
>>
>> Those things are on Alvaro's plate for the WAL receiver portion, and I
>> was the author of those patches. The WAL sender portion is older
>> though, but it seems crazy to me to not fix both things at the same
>> time per their similarities.
>
> As a 9.6 commit, b1a9bad cannot be the cause of a v10 open item.  If a v10
> commit expanded the consequences of a pre-existing bug, the committer of that
> v10 work owns this open item.  If the bug's consequences are the same in v9.6
> and v10, this is ineligible to be an open item.  Which applies?

You are right. Even 1bdae16f is from 9.6. Mea culpa. I have moved that
into the section of older bugs.
-- 
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Robert Haas

Date:

07 June 2017, 21:31:13

On Sat, May 20, 2017 at 8:40 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
>> similar fix.
>
> Actually, now that I look at it, ready_to_display should as well be
> protected by the lock of the WAL receiver, so it is incorrectly placed
> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
> is lazy as well, and that's new in 10, so we have an open item here
> for both of them. And I am the author for both things. No issues
> spotted in walreceiverfuncs.c after review.
>
> I am adding an open item so as both issues are fixed in PG10. With the
> WAL sender part, I think that this should be a group shot.
>
> So what do you think about the attached?

I think if you're going to fix it so that we take spinlocks on
MyWalSnd in a bunch of places that we didn't take them before, it
would make sense to fix all the places where we're accessing those
fields without a spinlock instead of only some of them, unless there
are good reasons why we only need it in some cases and not others, in
which case I think the patch should add comments to each place where
the lock was not taken explaining why it's OK.  It didn't take me and
my copy of vi very long to find instances that you didn't change.

I also think that should provide some analysis regarding the question
Thomas asked - are these actually bugs, or are they OK for some
reason?  We shouldn't just plaster the code with spinlock acquisitions
unless it's really necessary.  It seems at least possible that these
changes could cause performance regressions, and it doesn't look like
you've done any testing or provided any analysis indicating whether
that's likely to happen or not.

Basically, I don't think this patch is ready to commit.  We have
neither evidence that it is necessary nor evidence that it is
complete.  I don't want to be unfair, here, but it seems to me you
ought to do a little more legwork before throwing this over the wall
to the pool of committers.

[ Side note: Erik's report on this thread initially seemed to suggest
that we needed this patch to make logical decoding stable.  But my
impression is that this is belied by subsequent developments on other
threads, so my theory is that this patch was never really related to
the problem, but rather than by the time Erik got around to testing
this patch, other fixes had made the problems relatively rare, and the
apparently-improved results with this patch were just chance.  If that
theory is wrong, it would be good to hear about it. ]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Erik Rijkers

Date:

08 June 2017, 00:00:59

On 2017-06-07 20:31, Robert Haas wrote:

> [...]
> 
> [ Side note: Erik's report on this thread initially seemed to suggest
> that we needed this patch to make logical decoding stable.  But my
> impression is that this is belied by subsequent developments on other
> threads, so my theory is that this patch was never really related to
> the problem, but rather than by the time Erik got around to testing
> this patch, other fixes had made the problems relatively rare, and the
> apparently-improved results with this patch were just chance.  If that
> theory is wrong, it would be good to hear about it. ]

Yes, agreed; I was probably mistaken.

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

08 June 2017, 07:15:02

v

On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think if you're going to fix it so that we take spinlocks on
> MyWalSnd in a bunch of places that we didn't take them before, it
> would make sense to fix all the places where we're accessing those
> fields without a spinlock instead of only some of them, unless there
> are good reasons why we only need it in some cases and not others, in
> which case I think the patch should add comments to each place where
> the lock was not taken explaining why it's OK.  It didn't take me and
> my copy of vi very long to find instances that you didn't change.

I take that you are referring to the two lookups in
WalSndWaitForWal(), one in exec_replication_command(), one in
WalSndLoop() and one in WalSndDone(). First let's remember that all
the fields protected by the spinlock are only updated in a WAL sender
process, so reading them directly is safe in this context: we know
that no other processes will write them. And all those functions are
called only in the context of a WAL sender process. So the copies of
MyWalSnd that you are referring to here don't need a spinlock. Is
there something I am missing?

Actually, perhaps it would make sense to add some Assert(am_walsender)
in this code?

> I also think that should provide some analysis regarding the question
> Thomas asked - are these actually bugs, or are they OK for some
> reason?  We shouldn't just plaster the code with spinlock acquisitions
> unless it's really necessary.  It seems at least possible that these
> changes could cause performance regressions, and it doesn't look like
> you've done any testing or provided any analysis indicating whether
> that's likely to happen or not.

Now taking a step back on the patch, v3 that I sent 3 weeks ago.
Additional spinlocks are taken in:
1) SyncRepReleaseWaiters(), which is called in a WAL sender. So indeed
no spinlocks are needed here. The last patch is incorrect here.
2) SyncRepGetSyncStandbysPriority() and SyncRepGetSyncStandbysQuorum()
as of HEAD via SyncRepGetSyncStandbys(), something that's not good if
done lock-less as this gets used for pg_stat_replication for
pg_stat_get_wal_senders(). This can return a garbage list of sync
standbys to the client as WAL senders update their flush, write and
apply positions in parallel to that, and don't care about SyncRepLock
as this gets calculated with the standby feedback messages. The
problem here is the gap between walsnd->sync_standby_priority and
walsnd->flush:
-- With a primary, and more or more sync standbys, take a checkpoint
in SyncRepGetSyncStandbysPriority() after the first check is passed.
-- Shutdown the standby and restart it.
-- The standby reconnects, initializes the WAL sender slot with
InitWalSenderSlot().
-- Let the process of SyncRepGetSyncStandbysPriority() continue, the
sync standby does not show up.
That would be really unlikely to happen, but the code is written in
such a way that it could happen. One could also say that this gives a
closer idea that the standby disconnected but it looks wrong to me to
do this value lookup without a required lock.
3) In walreceiver.c's pg_stat_get_wal_receiver's:
- Launch pg_stat_get_wal_receiver and take a checkpoint on it.
- Pass the lookups of pid and ready_to_display.
- Make the WAL receiver stop.
- The view reports inconsistent data. If the WAL receiver data was
just initialized, the caller would get back PID values or similar 0.
Particularly a WAL receiver marked with !ready_to_display could show
data to the caller. That's not cool.
4) KeepFileRestoredFromArchive() needs a lock, as this is called from
the startup process. That's harmless, the worst that can happen is
that needreload is switched to true for a process that has been marked
with a PID of 0 because of a WAL sender restart (slot taken again and
initialized). But that will just process an extra reload in a worst
case.

> Basically, I don't think this patch is ready to commit.  We have
> neither evidence that it is necessary nor evidence that it is
> complete.  I don't want to be unfair, here, but it seems to me you
> ought to do a little more legwork before throwing this over the wall
> to the pool of committers.

Fair enough. Is the analysis written above more convincing?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

walsnd-pid-races-v4.patch

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

13 June 2017, 05:16:54

On Thu, Jun 8, 2017 at 1:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> 3) In walreceiver.c's pg_stat_get_wal_receiver's:
> - Launch pg_stat_get_wal_receiver and take a checkpoint on it.
> - Pass the lookups of pid and ready_to_display.
> - Make the WAL receiver stop.
> - The view reports inconsistent data. If the WAL receiver data was
> just initialized, the caller would get back PID values or similar 0.
> Particularly a WAL receiver marked with !ready_to_display could show
> data to the caller. That's not cool.

This can actually leak password information to any user who is a
member of the group DEFAULT_ROLE_READ_ALL_STATS, though the window to
put hands on this password information is very small, it can be
possible if the WAL receiver gets down and restarted for a reason or
another during a maintenance window for example:
1) The user queries pg_stat_wal_receiver, for example take a
breakpoint just after the check on walrcv->ready_to_display.
2) Restart the primary once, forcing the standby to spawn a new WAL receiver.
3) Breakpoint on the WAL receiver process, with something like that to help:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -243,6 +243,7 @@ WalReceiverMain(void)
   /* Fetch information required to start streaming */   walrcv->ready_to_display = false;
+   pg_usleep(10000000L); /* 10s */   strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);   strlcpy(slotname,
(char*) walrcv->slotname, NAMEDATALEN);   startpoint = walrcv->receiveStart;
 
4) continue the session querying pg_stat_wal_receiver, at this stage
it has information with the WAL receiver data set as
!ready_to_display. If the connection string includes a password, this
becomes visible as well.

If queried at high frequency, pg_stat_wal_receiver may show up some
information. Postgres 9.6 includes this leak as well, but there is no real
leak non-superusers will just see NULL fields for this view. In Postgres
10 though, any member of this group for statistics can see leaking
information. Based on that, this is an open item, so I have added it back
now to the list, moved from the section for older bugs.
-- 
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

15 June 2017, 04:59:46

On Tue, Jun 13, 2017 at 11:16:54AM +0900, Michael Paquier wrote:
> On Thu, Jun 8, 2017 at 1:15 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > 3) In walreceiver.c's pg_stat_get_wal_receiver's:
> > - Launch pg_stat_get_wal_receiver and take a checkpoint on it.
> > - Pass the lookups of pid and ready_to_display.
> > - Make the WAL receiver stop.
> > - The view reports inconsistent data. If the WAL receiver data was
> > just initialized, the caller would get back PID values or similar 0.
> > Particularly a WAL receiver marked with !ready_to_display could show
> > data to the caller. That's not cool.
> 
> This can actually leak password information to any user who is a
> member of the group DEFAULT_ROLE_READ_ALL_STATS, though the window to
> put hands on this password information is very small, it can be
> possible if the WAL receiver gets down and restarted for a reason or
> another during a maintenance window for example:
> 1) The user queries pg_stat_wal_receiver, for example take a
> breakpoint just after the check on walrcv->ready_to_display.
> 2) Restart the primary once, forcing the standby to spawn a new WAL receiver.
> 3) Breakpoint on the WAL receiver process, with something like that to help:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -243,6 +243,7 @@ WalReceiverMain(void)
> 
>     /* Fetch information required to start streaming */
>     walrcv->ready_to_display = false;
> +   pg_usleep(10000000L); /* 10s */
>     strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);
>     strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN);
>     startpoint = walrcv->receiveStart;
> 4) continue the session querying pg_stat_wal_receiver, at this stage
> it has information with the WAL receiver data set as
> !ready_to_display. If the connection string includes a password, this
> becomes visible as well.
> 
> If queried at high frequency, pg_stat_wal_receiver may show up some
> information. Postgres 9.6 includes this leak as well, but there is no real
> leak non-superusers will just see NULL fields for this view. In Postgres
> 10 though, any member of this group for statistics can see leaking
> information. Based on that, this is an open item, so I have added it back
> now to the list, moved from the section for older bugs.

Formally, the causative commit is the one that removed the superuser() test,
namely 25fff40.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Simon,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Kyotaro HORIGUCHI

Date:

15 June 2017, 10:51:00

Hi,

At Thu, 8 Jun 2017 13:15:02 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTfu2P1QCZB22LJG+P4ET-pS0yW+qSTWO9JcVcCtaPZAA@mail.gmail.com>
> On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > I think if you're going to fix it so that we take spinlocks on
> > MyWalSnd in a bunch of places that we didn't take them before, it
> > would make sense to fix all the places where we're accessing those
> > fields without a spinlock instead of only some of them, unless there
> > are good reasons why we only need it in some cases and not others, in
> > which case I think the patch should add comments to each place where
> > the lock was not taken explaining why it's OK.  It didn't take me and
> > my copy of vi very long to find instances that you didn't change.
> 
> I take that you are referring to the two lookups in
> WalSndWaitForWal(), one in exec_replication_command(), one in
> WalSndLoop() and one in WalSndDone(). First let's remember that all
> the fields protected by the spinlock are only updated in a WAL sender
> process, so reading them directly is safe in this context: we know
> that no other processes will write them. And all those functions are
> called only in the context of a WAL sender process. So the copies of
> MyWalSnd that you are referring to here don't need a spinlock. Is
> there something I am missing?
> 
> Actually, perhaps it would make sense to add some Assert(am_walsender)
> in this code?

It is not obvious that reading MyWalSnd without spinlock is safe
without knowing that it is modified only by itself. Based on that
I see two ways to go.

1. Put spinlock there (WalSndWaitForWal and..) even though it is  not required. It ensures safety even if it is
modifiedby  other processes for possible additional conflict with other  readers. (Unless someone careless forgets lock
fornew code).
 

2. Put a comment there instead (maybe assertion is not verbose  enough), containing something like the mention above.


By the way, this this discussion make me think of
SyncRepInitConfig. Taking SyncRepLock seems unnecessary there to
me. 

> > I also think that should provide some analysis regarding the question
> > Thomas asked - are these actually bugs, or are they OK for some
> > reason?  We shouldn't just plaster the code with spinlock acquisitions
> > unless it's really necessary.  It seems at least possible that these
> > changes could cause performance regressions, and it doesn't look like
> > you've done any testing or provided any analysis indicating whether
> > that's likely to happen or not.
> 
> Now taking a step back on the patch, v3 that I sent 3 weeks ago.
> Additional spinlocks are taken in:
> 1) SyncRepReleaseWaiters(), which is called in a WAL sender. So indeed
> no spinlocks are needed here. The last patch is incorrect here.

Agreed that it is not necessary, but we might should put one
depending on the conclusion of the discussion.

> 2) SyncRepGetSyncStandbysPriority() and SyncRepGetSyncStandbysQuorum()
> as of HEAD via SyncRepGetSyncStandbys(), something that's not good if
> done lock-less as this gets used for pg_stat_replication for
> pg_stat_get_wal_senders(). This can return a garbage list of sync
> standbys to the client as WAL senders update their flush, write and
> apply positions in parallel to that, and don't care about SyncRepLock
> as this gets calculated with the standby feedback messages. The
> problem here is the gap between walsnd->sync_standby_priority and
> walsnd->flush:
> -- With a primary, and more or more sync standbys, take a checkpoint
> in SyncRepGetSyncStandbysPriority() after the first check is passed.
> -- Shutdown the standby and restart it.
> -- The standby reconnects, initializes the WAL sender slot with
> InitWalSenderSlot().
> -- Let the process of SyncRepGetSyncStandbysPriority() continue, the
> sync standby does not show up.
> That would be really unlikely to happen, but the code is written in
> such a way that it could happen. One could also say that this gives a
> closer idea that the standby disconnected but it looks wrong to me to
> do this value lookup without a required lock.

It doesn't seem to be a problem for
pg_stat_get_wal_senders(). For SyncRepReleaseWaiters, also it
doesn't seem to be a problem because next reply message runs the
process again.

> 3) In walreceiver.c's pg_stat_get_wal_receiver's:
> - Launch pg_stat_get_wal_receiver and take a checkpoint on it.
> - Pass the lookups of pid and ready_to_display.
> - Make the WAL receiver stop.
> - The view reports inconsistent data. If the WAL receiver data was
> just initialized, the caller would get back PID values or similar 0.
> Particularly a WAL receiver marked with !ready_to_display could show
> data to the caller. That's not cool.

Totally agreed.

> 4) KeepFileRestoredFromArchive() needs a lock, as this is called from
> the startup process. That's harmless, the worst that can happen is
> that needreload is switched to true for a process that has been marked
> with a PID of 0 because of a WAL sender restart (slot taken again and
> initialized). But that will just process an extra reload in a worst
> case.

Agreed to the disucussion but it doesn't seem to me to be
necessary, however, seems better to have.

By the way InitWalSenderSlot seems fogetting to initialize
needreload and sync_standby_priority. The initialization code
would be better to be arranged in the same order with its
defintion. Initializing needreload is quite natural from its
definition. (Still it is possible that an extra reload happens if
a reload is requested just after the initialization...)

> > Basically, I don't think this patch is ready to commit.  We have
> > neither evidence that it is necessary nor evidence that it is
> > complete.  I don't want to be unfair, here, but it seems to me you
> > ought to do a little more legwork before throwing this over the wall
> > to the pool of committers.
> 
> Fair enough. Is the analysis written above more convincing?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Simon Riggs

Date:

15 June 2017, 11:04:44

On 15 June 2017 at 02:59, Noah Misch <noah@leadboat.com> wrote:

> Formally, the causative commit is the one that removed the superuser() test,
> namely 25fff40.
>
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
>
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Thanks for letting me know. I'm on leave at present, so won't be able
to get to this until June 20, though will make it a priority for then.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

22 June 2017, 08:52:50

On Thu, Jun 15, 2017 at 09:04:44AM +0100, Simon Riggs wrote:
> On 15 June 2017 at 02:59, Noah Misch <noah@leadboat.com> wrote:
> 
> > Formally, the causative commit is the one that removed the superuser() test,
> > namely 25fff40.
> >
> > [Action required within three days.  This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days of
> > this message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all fixed
> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
> > toward speedy resolution.  Thanks.
> >
> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> Thanks for letting me know. I'm on leave at present, so won't be able
> to get to this until June 20, though will make it a priority for then.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

24 June 2017, 08:21:44

On Wed, Jun 21, 2017 at 10:52:50PM -0700, Noah Misch wrote:
> On Thu, Jun 15, 2017 at 09:04:44AM +0100, Simon Riggs wrote:
> > On 15 June 2017 at 02:59, Noah Misch <noah@leadboat.com> wrote:
> > 
> > > Formally, the causative commit is the one that removed the superuser() test,
> > > namely 25fff40.
> > >
> > > [Action required within three days.  This is a generic notification.]
> > >
> > > The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> > > since you committed the patch believed to have created it, you own this open
> > > item.  If some other commit is more relevant or if this does not belong as a
> > > v10 open item, please let us know.  Otherwise, please observe the policy on
> > > open item ownership[1] and send a status update within three calendar days of
> > > this message.  Include a date for your subsequent status update.  Testers may
> > > discover new open items at any time, and I want to plan to get them all fixed
> > > well in advance of shipping v10.  Consequently, I will appreciate your efforts
> > > toward speedy resolution.  Thanks.
> > >
> > > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> > 
> > Thanks for letting me know. I'm on leave at present, so won't be able
> > to get to this until June 20, though will make it a priority for then.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-25 06:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Alvaro Herrera

Date:

25 June 2017, 08:11:13

Noah Misch wrote:

> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-06-25 06:00 UTC, I will transfer this item to release management team
> ownership without further notice.

I volunteer to own this item.  Next update before Wednesday 28th 19:00 CLT.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

25 June 2017, 11:38:27

On Sun, Jun 25, 2017 at 2:11 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Noah Misch wrote:
>
>> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
>> for your status update.  Please reacquaint yourself with the policy on open
>> item ownership[1] and then reply immediately.  If I do not hear from you by
>> 2017-06-25 06:00 UTC, I will transfer this item to release management team
>> ownership without further notice.
>
> I volunteer to own this item.  Next update before Wednesday 28th 19:00 CLT.

Thanks Álvaro.
--
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Alvaro Herrera

Date:

29 June 2017, 01:52:14

Michael Paquier wrote:

> On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > I think if you're going to fix it so that we take spinlocks on
> > MyWalSnd in a bunch of places that we didn't take them before, it
> > would make sense to fix all the places where we're accessing those
> > fields without a spinlock instead of only some of them, unless there
> > are good reasons why we only need it in some cases and not others, in
> > which case I think the patch should add comments to each place where
> > the lock was not taken explaining why it's OK.  It didn't take me and
> > my copy of vi very long to find instances that you didn't change.
> 
> I take that you are referring to the two lookups in
> WalSndWaitForWal(), one in exec_replication_command(), one in
> WalSndLoop() and one in WalSndDone(). First let's remember that all
> the fields protected by the spinlock are only updated in a WAL sender
> process, so reading them directly is safe in this context: we know
> that no other processes will write them. And all those functions are
> called only in the context of a WAL sender process. So the copies of
> MyWalSnd that you are referring to here don't need a spinlock. Is
> there something I am missing?

I assume you mean "reading them unlocked" when you write "reading them
directly".  If so, I agree with that rationale; I verified that it
applies to members state, sentPtr, write, sent, flush, writeLag,
sentLag, flushLag.  I wonder if there's a maintainability danger: as
soon as we add a write to those members in a process other than the
walsender itself, we have a bug.  I don't yet have an opinion on the
severity of that problem.

Other struct members such as needreload are written by other processes,
so they should be always accessed with mutex held.  Regarding pid, it
seems easiest if we use the rule that it must also be always accessed
with spinlock held.  I propose updating the comment on WalSnd to this:

/** Each walsender has a WalSnd struct in shared memory.** This struct is protected by 'mutex', with two exceptions;
oneis* sync_standby_priority as noted below.  The other exception is that some* members are only written by the
walsenderprocess itself, and thus that* process is free to read those members without holding spinlock.  pid and*
needreloadalways require the spinlock to be held for all accesses.*/

> Actually, perhaps it would make sense to add some Assert(am_walsender)
> in this code?

I don't think that's needed, since MyWalSnd will only be valid in a
walsender.

I think I'm done with the walsender half of this patch; I still need to
review the walreceiver part.  I will report back tomorrow 19:00 CLT.

Currently, I'm considering not to backpatch any of this.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

29 June 2017, 03:34:28

On Thu, Jun 29, 2017 at 7:52 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I take that you are referring to the two lookups in
>> WalSndWaitForWal(), one in exec_replication_command(), one in
>> WalSndLoop() and one in WalSndDone(). First let's remember that all
>> the fields protected by the spinlock are only updated in a WAL sender
>> process, so reading them directly is safe in this context: we know
>> that no other processes will write them. And all those functions are
>> called only in the context of a WAL sender process. So the copies of
>> MyWalSnd that you are referring to here don't need a spinlock. Is
>> there something I am missing?
>
> I assume you mean "reading them unlocked" when you write "reading them
> directly".

Both expressions sound the same to me. So yes I meant that.

> If so, I agree with that rationale; I verified that it
> applies to members state, sentPtr, write, sent, flush, writeLag,
> sentLag, flushLag.  I wonder if there's a maintainability danger: as
> soon as we add a write to those members in a process other than the
> walsender itself, we have a bug.  I don't yet have an opinion on the
> severity of that problem.
>
> Other struct members such as needreload are written by other processes,
> so they should be always accessed with mutex held.  Regarding pid, it
> seems easiest if we use the rule that it must also be always accessed
> with spinlock held.  I propose updating the comment on WalSnd to this:
>
> /*
>  * Each walsender has a WalSnd struct in shared memory.
>  *
>  * This struct is protected by 'mutex', with two exceptions; one is
>  * sync_standby_priority as noted below.  The other exception is that some
>  * members are only written by the walsender process itself, and thus that
>  * process is free to read those members without holding spinlock.  pid and
>  * needreload always require the spinlock to be held for all accesses.
>  */

Agreed, a comment seems appropriate to me. That's in line with what
Horiguchi-san has mentioned upthread.

> I think I'm done with the walsender half of this patch; I still need to
> review the walreceiver part.  I will report back tomorrow 19:00 CLT.

Thanks!

> Currently, I'm considering not to backpatch any of this.

Considering how crazy the conditions to make the information fetched
by users inconsistent are met, I agree with that.
-- 
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

30 June 2017, 06:43:58

On Wed, Jun 28, 2017 at 06:52:14PM -0400, Alvaro Herrera wrote:
> I think I'm done with the walsender half of this patch; I still need to
> review the walreceiver part.  I will report back tomorrow 19:00 CLT.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Alvaro Herrera

Date:

30 June 2017, 07:52:06

Noah Misch wrote:
> On Wed, Jun 28, 2017 at 06:52:14PM -0400, Alvaro Herrera wrote:
> > I think I'm done with the walsender half of this patch; I still need to
> > review the walreceiver part.  I will report back tomorrow 19:00 CLT.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Real life took over.  I'll report before tomorrow same time.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Alvaro Herrera

Date:

01 July 2017, 01:20:43

Michael Paquier wrote:
> On Thu, Jun 29, 2017 at 7:52 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > I think I'm done with the walsender half of this patch; I still need to
> > review the walreceiver part.  I will report back tomorrow 19:00 CLT.
> 
> Thanks!
> 
> > Currently, I'm considering not to backpatch any of this.
> 
> Considering how crazy the conditions to make the information fetched
> by users inconsistent are met, I agree with that.

Pushed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

03 July 2017, 08:05:46

On Sat, Jul 1, 2017 at 7:20 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> Considering how crazy the conditions to make the information fetched
>> by users inconsistent are met, I agree with that.
>
> Pushed.

Thanks Álvaro for pushing the patch. I had a second look and the
result looks good to me.

-       SpinLockAcquire(&walsnd->mutex);
+       }
+       pid = walsnd->pid;
The WAL receiver code used a cast to (int) in
pg_stat_get_wal_receiver(). I don't recall adding it. Why not being
consistent for both by removing the one of the WAL receiver code?

> In passing, clean up some leftover braces which were used to create
> unconditional blocks.  Once upon a time these were used for
> volatile-izing accesses to those shmem structs, which is no longer
> required.  Many other occurrences of this pattern remain.

Here are the places where a cleanup can happen:
- WalSndSetState
- ProcessStandbyReplyMessage
- XLogRead, 2 places
- XLogSendLogical
- WalRcvWaitForStartPosition
- WalRcvDie
- XLogWalRcvFlush
- ProcessWalSndrMessage
In most of the places of the WAL sender, braces could be removed to
improve the style. For the WAL receiver, declarations are not
necessary. As a matter of style, why not cleaning up just the WAL
sender stuff? Changing the WAL receiver code just to remove some
declarations would not improve readability, and would make back-patch
more difficult.
--
Michael

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Alvaro Herrera

Date:

03 July 2017, 19:14:55

Michael Paquier wrote:

> Thanks Álvaro for pushing the patch. I had a second look and the
> result looks good to me.
> 
> -       SpinLockAcquire(&walsnd->mutex);
> +       }
> +       pid = walsnd->pid;
> The WAL receiver code used a cast to (int) in
> pg_stat_get_wal_receiver(). I don't recall adding it.

I added it because it made me a bit nervous to pass a pid_t to
DatumGetInt32.  This one is assigning to a variable of type pid_t, so it
doesn't need a cast.

I'm not too clear on using pid_t variables as int32 Datum-packed
variables.  We don't do it a lot in the backend code (I found some
occurrences in contrib, but these don't inspire me a lot of confidence.)

> Why not being consistent for both by removing the one of the WAL
> receiver code?

I can't think of any reason to remove that cast.  It serves as
documentation if nothing else -- it alerts the reader that something is
going on.

> > In passing, clean up some leftover braces which were used to create
> > unconditional blocks.  Once upon a time these were used for
> > volatile-izing accesses to those shmem structs, which is no longer
> > required.  Many other occurrences of this pattern remain.
> 
> Here are the places where a cleanup can happen:
> - WalSndSetState
> - ProcessStandbyReplyMessage
> - XLogRead, 2 places
> - XLogSendLogical
> - WalRcvWaitForStartPosition
> - WalRcvDie
> - XLogWalRcvFlush
> - ProcessWalSndrMessage
> In most of the places of the WAL sender, braces could be removed to
> improve the style. For the WAL receiver, declarations are not
> necessary. As a matter of style, why not cleaning up just the WAL
> sender stuff? Changing the WAL receiver code just to remove some
> declarations would not improve readability, and would make back-patch
> more difficult.

I think we should clean this up whenever we're modifying the surrounding
code, but otherwise we can leave well enough alone.  It's not a high
priority item at any rate.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Noah Misch

Date:

04 July 2017, 07:34:46

On Mon, Jul 03, 2017 at 12:14:55PM -0400, Alvaro Herrera wrote:
> Michael Paquier wrote:

> > > In passing, clean up some leftover braces which were used to create
> > > unconditional blocks.  Once upon a time these were used for
> > > volatile-izing accesses to those shmem structs, which is no longer
> > > required.  Many other occurrences of this pattern remain.
> > 
> > Here are the places where a cleanup can happen:
> > - WalSndSetState
> > - ProcessStandbyReplyMessage
> > - XLogRead, 2 places
> > - XLogSendLogical
> > - WalRcvWaitForStartPosition
> > - WalRcvDie
> > - XLogWalRcvFlush
> > - ProcessWalSndrMessage
> > In most of the places of the WAL sender, braces could be removed to
> > improve the style. For the WAL receiver, declarations are not
> > necessary. As a matter of style, why not cleaning up just the WAL
> > sender stuff? Changing the WAL receiver code just to remove some
> > declarations would not improve readability, and would make back-patch
> > more difficult.
> 
> I think we should clean this up whenever we're modifying the surrounding
> code, but otherwise we can leave well enough alone.  It's not a high
> priority item at any rate.

Bundling code cleanup into commits that also do something else is strictly
worse than bundling whitespace cleanup, which is itself bad:
https://postgr.es/m/flat/20160113144826.GB3379802@tornado.leadboat.com

Deferring cleanup and pushing cleanup-only commits are each good options.

Re: [HACKERS] Race conditions with WAL sender PID lookups

From

Michael Paquier

Date:

07 July 2017, 09:02:28

On Tue, Jul 4, 2017 at 1:34 PM, Noah Misch <noah@leadboat.com> wrote:
> Bundling code cleanup into commits that also do something else is strictly
> worse than bundling whitespace cleanup, which is itself bad:
> https://postgr.es/m/flat/20160113144826.GB3379802@tornado.leadboat.com

FWIW, I agree with that. I favor as well separate commits for cleanups
and for fixes, so as each commit has its own goal and protects it.

(The cleanups discussed on this thread have been partially done in
commit 572d6ee where a bug has been fixed, not by me on the patches I
submitted ;) )
-- 
Michael