Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Date
Msg-id 1409.1506981260@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM
List pgsql-hackers
I wrote:
> If this is the only problem then I'd agree we should stick to a spinlock
> (I assume the strings in question can't be very long).  I was thinking
> more about what to do if we find other violations that are harder to fix.

I took a quick look through walreceiver.c, and there are a couple of
obvious problems of the same ilk in WalReceiverMain, which is doing this:
walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp();

(ie, a potential kernel call) inside a spinlock.  But there seems no
real problem with just collecting the timestamp before we enter that
critical section.

I also don't especially like the fact that just above there it reaches
elog(PANIC) with the lock still held, though at least that's a case that
should never happen.

Further down, it's doing a pfree() inside the spinlock, apparently
for no other reason than to save one "if (tmp_conninfo)".

I don't especially like the Asserts inside spinlocks, either.  Personally,
I think if those conditions are worth testing then they're worth testing
for real (in production).  Variables that are manipulated by multiple
processes are way more likely to assume unexpected states than local
variables.

I'm also rather befuddled by the fact that this code sets and clears
walrcv->latch outside the critical sections.  If that field is used
by any other process, surely that's completely unsafe.  If it isn't,
why is it being kept in shared memory?
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.