Re: [HACKERS] Race conditions with WAL sender PID lookups - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Race conditions with WAL sender PID lookups
Date
Msg-id CAB7nPqRa3EpFedbW8RuHpmSn8u_fdzV3oCRhRndFqWfELSvHVg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Race conditions with WAL sender PID lookups  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] Race conditions with WAL sender PID lookups  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [HACKERS] Race conditions with WAL sender PID lookups  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I had my eyes on the WAL sender code this morning, and I have noticed
>> that walsender.c is not completely consistent with the PID lookups it
>> does in walsender.c. In two code paths, the PID value is checked
>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>> doing for ages.
>
> There is also code that accesses shared walsender state without
> spinlocks over in syncrep.c.  I think that file could use a few words
> of explanation for why it's OK to access pid, state and flush without
> synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] different column orders in regression test database
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur