Home > mailing lists

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM
Date	October 3, 2017 03:54:36
Msg-id	CAB7nPqQW1h71DG4F4mKTCPVfb1sRxMAGeTLi-gEAn0mtcD3Y9Q@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
List	pgsql-hackers

Tree view

On Tue, Oct 3, 2017 at 6:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> If this is the only problem then I'd agree we should stick to a spinlock
>> (I assume the strings in question can't be very long).  I was thinking
>> more about what to do if we find other violations that are harder to fix.

I don't think that there is any need to switch to a LWLock. Any issues
in need to be dealt with here don't require it, if we are fine with
the memcpy method of course.

> I took a quick look through walreceiver.c, and there are a couple of
> obvious problems of the same ilk in WalReceiverMain, which is doing this:
>
>         walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp();
>
> (ie, a potential kernel call) inside a spinlock.  But there seems no
> real problem with just collecting the timestamp before we enter that
> critical section.

No problems seen either from here.

> I also don't especially like the fact that just above there it reaches
> elog(PANIC) with the lock still held, though at least that's a case that
> should never happen.

This part has been around since the beginning in 1bb2558. I agree that
the lock should be released before doing the logging.

> Further down, it's doing a pfree() inside the spinlock, apparently
> for no other reason than to save one "if (tmp_conninfo)".

Check.

> I don't especially like the Asserts inside spinlocks, either.  Personally,
> I think if those conditions are worth testing then they're worth testing
> for real (in production).  Variables that are manipulated by multiple
> processes are way more likely to assume unexpected states than local
> variables.

Those could be replaced by similar checks using some
PG_USED_FOR_ASSERTS_ONLY out of the spin lock sections, though I am
not sure if those are worth worrying. What do others think?

> I'm also rather befuddled by the fact that this code sets and clears
> walrcv->latch outside the critical sections.  If that field is used
> by any other process, surely that's completely unsafe.  If it isn't,
> why is it being kept in shared memory?

Do you mean the introduction of WalRcvForceReply by 314cbfc? This is
more recent, and has been discussed during the review of the
remote_apply patch here to avoid sending SIGUSR1 too much from the
startup process to the WAL receiver:
https://www.postgresql.org/message-id/CA+TgmobPsROS-gFk=_KJdW5scQjcKtpiLhsH9Cw=UWH1htFKaw@mail.gmail.com

I am attaching a patch that addresses the bugs for the spin lock sections.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

walreceiver-spin-calls.patch

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 03 October 2017, 03:42:19
Subject: [HACKERS] Conversion error

From: Michael Paquier
Date: 03 October 2017, 04:09:38
Subject: Re: [HACKERS] 64-bit queryId?

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM - Mailing list pgsql-hackers

Attachment

Previous

Next