Re: the s_lock_stuck on perform_spin_delay - Mailing list pgsql-hackers

From Andy Fan
Subject Re: the s_lock_stuck on perform_spin_delay
Date
Msg-id 87edeexznn.fsf@163.com
Whole thread Raw
In response to Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: the s_lock_stuck on perform_spin_delay
List pgsql-hackers
Hi Robert,

Thanks for your attention!

> On Thu, Jan 18, 2024 at 7:56 AM Andy Fan <zhihuifan1213@163.com> wrote:
>> Do you still have interest with making some progress on this topic?
>
> Some, but it's definitely not my top priority. I wish I could give as
> much attention as everyone would like to everyone's patches, but I
> can't even come close.

Your point is fair enough.

After reading your comments, I decide to talk more before sending the
next version.

>
> I think that the stack that you set up in START_SPIN_LOCK() is crazy.
> That would only be necessary if it were legal to acquire multiple
> spinlocks at the same time, which it definitely isn't. Also, doing
> memory allocation and deallocation here appears highly undesirable -
> even if we did need to support multiple spinlocks, it would be better
> to handle this using something like the approach we already use for
> lwlocks, where there is a fixed size array and we blow up if it
> overflows.

I wanted to disallow to acquire multiple spinlocks at the same time in
the first version, but later I thought that is beyond of the scope of
this patch. Now I prefer to disallow that. if there is no objection in
the following days, I will do this in next version. After this, we don't
need malloc at all.

>
> ASSERT_NO_SPIN_LOCK() looks odd, because I would expect it to be
> spelled Assert(!AnySpinLockHeld()). But looking deeper, I see that it
> doesn't internally Assert() but rather does something else. Maybe the
> name needs some workshopping. SpinLockMustNotBeHeldHere()?
> VerifyNoSpinLocksHeld()?

Yes, it is not a Assert since I want to provide more information about
where the SpinLock was held. Assert doesn't have such capacity but
elog(PANIC, ...) can put more information before the PANIC.

VerifyNoSpinLocksHeld looks much more professional than
ASSERT_NO_SPIN_LOCK; I will use this in the next version.


> I think we should check that no spinlock is held in a few additional
> places: the start of SpinLockAcquire(), and the start of errstart().

Agreed.

> You added an #include to dynahash.c despite making no other changes to
> the file.

That's mainly because I put the code into miscadmin.h and spin.h depends
on miscadmin.h with MACROs.

>
> I don't know whether the choice to treat buffer header locks as
> spinlocks is correct. It seems like it at least deserves a comment,
> and possibly some discussion on this mailing list about whether that's
> the right idea. I'm not sure that we have all the same restrictions
> for buffer header locks as we do for spinlocks in general, but I'm
> also not sure that we don't.

The LockBufHdr also used init_local_spin_delay / perform_spin_delay
infrastruce and then it has the same issue like ${subject}, it is pretty
like the code in s_lock; Based on my current knowledge, I think we
should add the check there.

>
> On a related note, the patch overall has 0 comments. I don't know that
> it needs a lot, but 0 isn't many at all.

hmm, I tried to write a good commit message, but comments do need some
improvement, thanks for highlighting this!

>
> miscadmin.h doesn't seem like a good place for this. It's a
> widely-included header file and these checks should be needed in
> relatively few places; also, they're not really related to most of
> what's in that file, IIRC.

they were put into spin.h in v1 but later move to miscadmin.h at [1].

> I also wonder why we're using macros instead of static inline
> functions.

START_SPIN_LOCK need to be macro since it use __FILE__ and __LINE__ to
note where the SpinLock is held. for others, just for consistent
purpose. I think they can be changed to inline function, at least for
VerifyNoSpinLocksHeld.

[1]
https://www.postgresql.org/message-id/CAEze2WggP-2Dhocmdhp-LxBzic%3DMXRgGA_tmv1G_9n-PDt2MQg%40mail.gmail.com

--
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: MERGE ... RETURNING
Next
From: Alexander Lakhin
Date:
Subject: BUG: Former primary node might stuck when started as a standby